Tag: ethics

Why museums should repatriate fossils

[ad_1]

In two separate expeditions 150 years ago, soldiers, scientists and mineral prospectors entered the lands of Lakotan people. This was just six years after the United States had formally recognized Lakotan sovereignty in the region.

In July 1874 — in the better known of the two expeditions — an ambitious young lieutenant colonel named George Armstrong Custer led more than 1,000 heavily armed troops into the Black Hills, a mountain range in what is now western South Dakota and eastern Wyoming. The official aim of the expedition was to map the area and find a suitable location to build a military fort, “to take the hostile backbone out of these unruly savages”, according to a high-ranking army commander¹. But Custer had also recruited a team of geologists, mineral prospectors and journalists to accompany his soldiers, in the hope of precipitating a gold rush.

The second expedition was led by Othniel Charles Marsh, a pre-eminent palaeontologist at Yale University in New Haven, Connecticut. In November that year, Marsh unearthed two tonnes of prehistoric fossils and shipped them to the Yale Peabody Museum of Natural History.

Scientific societies have a part to play in repatriating fossils

Custer’s expedition is often used to demonstrate the ruthless tactics deployed by the United States to colonize Native American lands. Marsh’s venture is generally seen in a more benign light, as an effort to further humanity’s understanding of Earth’s history. But a closer look at what happened in 1874, and in the years that followed, reveals that Marsh also contributed to the dispossession of Native American people. He extracted countless specimens from Lakotan lands without permission. When he shipped the fossils to Connecticut, most of the Lakotan meanings and stories tied to them were stripped away. Perhaps most egregiously, Marsh’s specimens were subsequently used to support an erroneous, racist theory of evolutionary progress that was deployed to justify the imperial expansion of the United States.

Around the world, archaeologists and anthropologists have taken steps to confront the darker side of their disciplines’ histories. In the United States, as of this January, federal law mandates that museums obtain “free, prior, and informed consent” from lineal descendants, American Indian Tribes and Native Hawaiian organizations before they exhibit human remains or cultural items (see go.nature.com/3wcvcvh). A similar reckoning for the Earth sciences — including a rethink of where and how natural history collections are curated — could help to rebuild Indigenous communities’ trust in science globally. It could also promote a greater diversity of perspectives on the natural world.

Material motives

According to Lakotan people (the Native American Nation now has more than 115,000 citizens), they have always lived in Paha Sapa, as they call the Black Hills. When strangers first started entering their hunting grounds — often in search of material wealth — Lakotans exchanged animal pelts with them for weapons and other goods. But over time, the flow of migrants led to rising tensions, particularly around the hunting of buffalo (Bison bison)².

Lakotan people were not just dependent on the buffalo for their livelihoods; they also viewed them as relatives. Whereas the region’s Indigenous people agreed not to hunt animals beyond what was needed, migrants began slaughtering buffalo on a massive scale, in part to limit the power and mobility of the Indigenous groups.

Othniel Charles Marsh (left) posing with Mahpiya Luta in a photographer’s studio.Credit: National Portrait Gallery, Smithsonian Institution

Hoping to curb the influx of migrants, a leader named Mahpiya Luta (Red Cloud), the head of a group of Lakotan people called the Oglala, forged a powerful alliance with the Cheyenne and Arapaho peoples, two other Native American Nations. When it became clear that the United States was failing to prevail on the battlefield, the federal government sued for peace and convened a treaty council in 1868 at Fort Laramie, in present-day Wyoming. After this meeting, the United States agreed to recognize Lakotan sovereignty in a swathe of territory about the size of Spain. This region, covering portions of what is now South Dakota, North Dakota, Montana, Wyoming and Nebraska, includes the Black Hills and the White River Badlands, a geological formation that is rich in vertebrate fossils, which range from 27 million to 37 million years old.

Custer’s expedition six years later almost immediately attracted countless migrants to the treaty lands. In August 1874, before Custer’s scientists had even finished their work, expedition journalists dispatched several reports to newspapers across the United States touting the region’s rich mineral resources. As well as making front-page headlines in newspapers such as the New York Tribune, Custer’s exploits were publicized in books, pamphlets, maps and advertisements. Some of the most sensational reports were printed by the US Land Office, as well as by railroad companies that sought to profit from westward migration.

For Marsh, it wasn’t rumours of the Black Hills’ rich mineral deposits that drew him to the region, but reports from military surveyors of a “vast deposit of fossil remains” in northern Nebraska³.

How rich countries skew the fossil record

Custer had actually invited Marsh to join his July expedition. But Marsh had declined, sending his most promising student, George Bird Grinnell, instead. Later that year, however, Marsh set out for the treaty lands. When he arrived at the Red Cloud Agency, a military post where the US government distributed money and supplies to Indigenous people, Marsh found that the assembled Indigenous groups were highly suspicious of his interest in fossils. According to a 22 December 1874 article in the New York Tribune, a meeting was convened during which White Tail, one of the Lakotan leaders, “sprang at once to his feet”, declaring “that the proposed bone-seeking was merely a ruse to begin digging for gold and invading the Black Hills”.

After a tense and protracted negotiation, Red Cloud agreed to let Marsh enter the Badlands as long as he was escorted by Lakotan guides, and as long as he promised to tell federal officials in Washington DC that a government officer in charge of the Red Cloud Agency was distributing poor-quality supplies and embezzling funds. According to the Tribune article, Marsh accepted, but when it became clear that the Lakotans were growing increasingly suspicious of his true motives, he resolved to leave for the White River Badlands without his guides, under the cover of darkness.

Over the next few days, Marsh and his military escort excavated cartloads of fossils — including the massive remains of extinct ungulates that Marsh named Brontotheriidae — all under the watchful eye of Lakotan warriors who had quickly realized what he was up to. Writing in the 17 November 1874 edition of the New York Tribune, a newspaper journalist even joked that “the search for fossils instead of that for gold may unlock the gates of the Black Hills”.

Lakotan people depended on the hunting of buffalo for food and pelts.Credit: Mark Newman/Getty

Marsh was not the only scientist to extract knowledge and research materials from Lakotan treaty lands. Many fossils have been taken from the region over the years, often without permission⁴. After the initial discoveries of 1874, Marsh sent his assistant John Bell Hatcher to find more specimens for the Peabody Museum. Over the next several decades, scientists from the American Museum of Natural History in New York City also collected specimens in the treaty lands, resulting in the publication in 1929 of a massive treatise on the region’s extinct ungulates by the museum’s president, Henry Fairfield Osborn⁵. But Marsh’s findings helped to support the imperial expansion of the United States in a more insidious way, too.

According to Marsh and his colleagues, extinction had a role in evolutionary advancement by creating ecological space for new and more advanced lineages. This idea — that certain less-evolved races are doomed to extinction, known as ‘racial senescence’ — was widely embraced by intellectual elites at the time. And it was used to support claims that North America’s Indigenous nations were naturally destined to disappear⁶. In other words, a speculative theory about evolution bolstered by fossils extracted — illegally — from treaty lands was invoked to justify the dispossession of those same lands by portraying their original inhabitants as naturally doomed to extinction.

Genuine collaborations

Today, the involvement of scientists such as Marsh in the colonization of North America is rarely discussed. A recently renovated exhibition about vertebrate fossils at the Peabody Museum, for instance, acknowledges the historical tensions between scientists and Native American people. But it also suggests that a more collaborative relationship existed between Marsh and Lakotan people than the evidence indicates. For example, one sign explains that Marsh was inspired by Lakotan stories about thunder beings (Wakiŋyaŋ) when he decided to use the Greek word for thunder beast to name the extinct ungulates he had excavated ‘brontotheres’. Wakiŋyaŋ are bird-like spirits whose conflict with ancient monsters called uŋhcegila is often used by Lakotan people to explain the abundance of fossils in the White River Badlands.

Prized dinosaur fossil will finally be returned to Brazil

In our view, Earth scientists and natural history museums should foster further conversations about the ways in which their disciplines and institutions are entangled with the violent history of colonialism. Some have already begun to take important steps in this direction. The Natural History Museum in Berlin, for example, has created an interdisciplinary research centre, involving historians, anthropologists and natural scientists, to explore the provenance of its collections.

We encourage more interdisciplinary research of this kind. More museums and universities should create scholarships to make their collections easily accessible to students and researchers from the lands from which they were extracted. Furthermore, scientists could do much more to forge reciprocal relationships with people on whose lands they wish to work — for instance, by discussing potential collaborations with local partners before submitting a funding proposal. Finally, we think that where and how natural history collections are curated should be re-evaluated.

There are clear legal arguments for repatriating fossils extracted from the White River Badlands. The 1868 Treaty of Fort Laramie — which, according to a 1980 ruling by the US Supreme Court, remains in effect — explicitly sets aside the lands around the Black Hills for the “absolute and undisturbed use and occupation” of Lakotan people (see go.nature.com/4ar2cat). Why should this language not cover the extraction of specimens?

Legends around the fossils include stories about a great race involving animals.Credit: Angela Babby

Implicitly or explicitly, it is often assumed that because Earth sciences deal with rocks and fossils, rather than with artefacts or human remains, they should not be held to the same ethical standards as anthropology and archaeology. In 1999, for example, under the Native American Graves Protection and Repatriation Act (NAGPRA), the Confederated Tribes of the Grand Ronde Community of Oregon asked for the return of a sacred meteorite called Tomanowos, or ‘sky person’. Describing the specimen as “a feature of the landscape”, the American Museum of Natural History denied the tribe’s claim and filed a federal lawsuit in 2000 to clarify that “NAGPRA does not cover this type of object”⁷. In support, an editorial in The New York Times stated that the meteorite “is a celestial object” that “belongs to us all, and is best left in the custody of the museum”. Similar arguments are often made for fossils that pre-date the evolution of humans⁸.

This line of reasoning overlooks the very different relationships between nature and culture, prehistory and the present that characterize the cosmology of Lakotans and many other Indigenous cultures worldwide.

The Lakotan creation story — the narrative of how Lakotan people came to be, passed down through generations — begins with Iŋyaŋ (stone). Iŋyaŋ existed before there was anything else, but he wanted something over which he could exercise his power. So he opened his veins and spread his blood in a great disk all around himself to create Maka (Earth). Because Iŋyaŋ was unable to staunch the flow of blood, his body became brittle and dry, and he was scattered all over Maka. From this initial sacrifice, everything else followed, including the sky, the Sun, the Moon, the plants and the animals⁹.

All animals came together for the start of the great race, according to legend.Credit: Del Iron Cloud

According to Luther Standing Bear, who worked to uphold Lakotan culture and sovereignty in the late nineteenth century, the creation story reminds Lakotans that they came from Maka, which in turn provides “the foundation for the love they bore for earth and all the things of the earth”¹⁰. These teachings are reinforced by ceremonies, such as the sweat lodge, a ritual that involves pouring water over hot stones while offering prayers to Tuŋkasila (grandfather or creator). As Albert White Hat, a respected teacher of Lakotan language and culture, explains¹¹: “Whether it’s the spirit of the eagle, or the coyote, or the spider, whatever spirit comes in the lodge, I address them as Tuŋkasila … because they represent the beginning of time until today. And they are my relatives, they are dear to me.”

The Lakotan saying Mitakuye Oyasiŋ, or ‘We are all related’, extends beyond humans to encompass everything in the treaty lands — including plants and animals, stones, the sky, fossils, thunderstorms and Earth itself. All are part of a complex network of relationships forged out of mutual dependence and reciprocity. In our view, museums have an ethical duty to return fossils extracted from the White River Badlands, not just because of the 1868 agreement, but because fossils are an integral part of the system of relationships that brings order to all life in the treaty lands.

Diverse world views

The realization that extractive capitalism is unsustainable is leading to a growing interest in how Lakotans and other Indigenous people relate to Earth². By partnering with Native American Nations to build museums on tribal lands, institutions such as the Peabody Museum could help to develop exhibits that present a much richer picture of the stories, meanings and interpretations tied to fossils than is currently possible.

Visitors could learn, for example, how a series of uplifts followed by erosion, which extend as far back as the Precambrian period (more than 500 million years ago), helped to create the Black Hills. But they could learn about the Lakotan narrative of how the Black Hills came to be, too.

“Far back in the first sunrise of time,” Lakotan storyteller James LaPointe explains in his 1976 book, Legends of the Lakota, the treaty lands were disordered and chaotic. After much deliberation, humanity held a great race to decide who could eat whom. As the seething mass of animals raced round and round, Earth began to quiver and, with “a thunderous roar, it burst open”. As flames rose all around them, the animals “lay dead in their tracks, covered with smoldering ashes and lava”¹². As a result of all this, “there are many large bones still lying along the historic track”, including the “huge bones of Unkche Ghila, which, once upon a time, roamed these prairie lands” and “can be found in the badlands to the east and south of the Black Hills”.

As well as being repositories for repatriated specimens, museums on Indigenous lands could employ local researchers, provide educational resources for local students and develop innovative exhibitions by involving Indigenous poets and artists — ultimately providing an inclusive and informative resource for everyone.

[ad_2]

Source link

June 18, 2024
Elite researchers in China say they had ‘no choice’ but to commit misconduct

[ad_1]

“I had no choice but to commit [research] misconduct,” admits a researcher at an elite Chinese university. The shocking revelation is documented in a collection of several dozen anonymous, in-depth interviews offering rare, first-hand accounts of researchers who engaged in unethical behaviour — and describing what tipped them over the edge. An article based on the interviews was published in April in the journal Research Ethics¹.

The interviewer, sociologist Zhang Xinqu, and his colleague criminologist Wang Peng, both at the University of Hong Kong, suggest that researchers felt compelled, and even encouraged, to engage in misconduct to protect their jobs. This pressure, they conclude, ultimately came from a Chinese programme to create globally recognized universities. The programme prompted some Chinese institutions to set ambitious publishing targets, they say.

The article offers “a glimpse of the pain and guilt that researchers felt”, when they engaged in unethical behaviour, says Elisabeth Bik, a scientific-image sleuth and consultant in San Francisco, California.

But other researchers say the findings paint an overly negative picture of the Chinese programme. Zheng Wenwen, who is responsible for research integrity at the Institute of Scientific and Technical Information of China, under the Ministry of Science and Technology, in Beijing, says that the sample size is too small to draw reliable conclusions. The study is based on interviews with staff at just three elite institutes — even though more than 140 institutions are now part of the programme to create internationally competitive universities and research disciplines.

Rankings a game

In 2015, the Chinese government introduced the Double First-Class Initiative to establish “world-class” universities and disciplines. Universities selected for inclusion in the programme receive extra funding, whereas those that perform poorly risk being delisted, says Wang.

Between May 2021 and April 2022, Zhang conducted anonymous virtual interviews with 30 faculty members and 5 students in the natural sciences at three of these elite universities. The interviewees included a president, deans and department heads. The researchers also analysed internal university documents.

The university decision-makers who were interviewed at all three institutes said they understood it to be their responsibility to interpret the goals of the Double First-Class scheme. They determined that, to remain on the programme, their universities needed to increase their standing in international rankings — and that, for that to happen, their researchers needed to publish more articles in international journals indexed in databases such as the Science Citation Index.

Some universities treated world university rankings as a “game” to win, says Wang.

As the directive moved down the institutional hierarchy, pressure to perform at those institutes increased. University departments set specific and hard-to-reach publishing criteria for academics to gain promotion and tenure.

Some researchers admitted to engaging in unethical research practices for fear of losing their jobs. In one interview, a faculty head said: “If anyone cannot meet the criteria [concerning publications], I suggest that they leave as soon as possible.”

Zhang and Wang describe researchers using services to write their papers for them, falsifying data, plagiarizing, exploiting students without offering authorship and bribing journal editors.

One interviewee admitted to paying for access to a data set. “I bought access to an official archive and altered the data to support my hypotheses.”

An associate dean emphasized the primacy of the publishing goal. “We should not be overly stringent in identifying and punishing research misconduct, as it hinders our scholars’ research efficiency.”

Not the whole picture

The authors “hit the nail on the head” in describing the relationship between institutional pressure and research misconduct, says Wang Fei, who studies research-integrity policy at Dalian University of Technology.

But she says it’s not the whole picture. Incentives to publish high-quality research are part of broader reforms to the higher-education system that “have been largely positive”. “The article focuses almost exclusively on the negative aspects, potentially misleading readers into thinking that Chinese higher education reforms are severely flawed and accelerating research misconduct.”

Tang Li, a science- and innovation-policy researcher at Fudan University in Shanghai, agrees. The first-hand accounts are valuable, but the findings could be biased, she says, because those who accepted the interview might have strong feelings and might not represent the opinions of those who declined to be interviewed.

Zheng disagrees with the study’s conclusions. In 2020, the government issued a directive for Double First-Class institutes. This states specifically that evaluations should be comprehensive, and not just focus on numbers of papers, she says. Research misconduct is a result not of the Double First-Class initiative, but of an “insufficient emphasis on research integrity education”, says Zheng.

Punishing misconduct

The larger problem, says Xiaotian Chen, a library and information scientist at Bradley University in Peoria, Illinois, is a lack of transparency and of systems to detect and deter misconduct in China. Most people do the right thing, despite the pressure to publish, says Chen, who has studied research misconduct in China. The pressure described in the paper could just be “an excuse to cheat”.

The Chinese government has introduced several measures to crack down on misconduct, including defining what constitutes violations and specifying appropriate penalties. They have also banned cash rewards for publishing in high-impact journals.

Wang Peng says that government policies need to be more specific about how they define and punish different types of misconduct.

But Zheng says that, compared with those that apply in other countries, “the measures currently taken by the Chinese government to punish research misconduct are already very stringent”.

The authors also ignore recent government guidance for elite Chinese institutions to break with the tendency of evaluating faculty members solely on the basis of their publications and academic titles, says Zheng.

Tang points out that the road to achieving integrity in research is long. “Cultivating research integrity takes time and requires orchestrated efforts from all stakeholders,” she says.

And the pressure to publish more papers to drive up university rankings “is not unique to China”, says Bik. “Whenever and wherever incentives and requirements are set up to make people produce more, there will be people ‘gaming the metrics’.”

[ad_2]

Source link

June 11, 2024
Meta AI system is a boost to endangered languages — as long as humans aren’t forgotten

[ad_1]

Machine translation works well for widely spoken languages — but languages with a smaller digital footprint struggle.Credit: Zhang Hengwei/China News Service/Getty

In this week’s Nature, a team that includes researchers at the technology company Meta describes a method of scaling up machine translation of ‘low-resourced’ languages for which there are few readily available digital sources¹. The company’s automated translation systems will now include more than 200 languages, many of them not currently served by machine-translation software. These include the southern African language Tswana; Dari, a type of Persian spoken in Afghanistan; and the Polynesian language Samoan.

It’s an important step that helps to close the digital gap between such neglected languages and languages that are more prevalent online, such as English, French and Russian. It could allow speakers of lower-resourced languages to access knowledge online in their first language, and possibly stave off the extinction of these languages by shepherding them into the digital era.

Meta’s AI translation model embraces overlooked languages

But machine-learning models are only as good as the data that they are fed — which are mainly created by humans. As machine-translation tools develop, the companies behind them must continue to engage with the communities they aim to serve, or risk squandering the technology’s promise.

Of the almost 7,000 languages spoken worldwide, about half are considered to be in danger of going extinct. A 2022 study² predicts that the rate of language loss could triple within 40 years. The dominance of just a few languages on the Internet is one driving force: it’s estimated that more than half of all websites are in English, and the top ten languages account for more than 80% of Internet content.

The researchers, based at Meta AI, Meta’s research division in New York City, the University of California, Berkeley, and Johns Hopkins University in Baltimore, Maryland, set out to expand the number of low-resource languages that their model translates as part of Meta AI’s ‘No Language Left Behind’ programme. They selected languages that were present in Wikipedia articles, but had fewer than 1 million sentences of example translations available online.

Read the paper: Scaling neural machine translation to 200 languages

This work doubles the number of languages made available by a previous iteration³, and makes improvements to translation quality. The researchers employed professional translators and reviewers to create a ‘seed’ data set in 39 of the languages, and developed a technique that allowed them to mine web data to create parallel data sets in the remaining languages. They also generated a list of some 200 ‘toxic’ words for each language, to identify translations that could, for example, constitute hate speech.

The involvement of human specialists is time-consuming and expensive — but crucial. Without them, algorithms would be trained on poor-quality data generated by artificial intelligence (AI), creating more errors. Models would then harvest this content and create even more poor-quality text. William Lamb, a linguist and ethnographer at the University of Edinburgh, UK, who was not involved in Meta AI’s programme, says that this is already happening for Scottish Gaelic, for which most online content is generated by AI. Scottish Gaelic is one of the low-resourced languages in the Meta programme for which the content was professionally translated. Human expertise is also important for languages that lack certain vocabulary. For example, many African languages do not have bespoke terms for scientific concepts. The research project Decolonise Science employed professional translators to translate 180 scientific papers into 6 African languages. It was initiated by Masakhane, a grassroots organization of researchers interested in natural language processing.

Such specialists are in short supply, however. This is one reason why researchers and technology companies must include communities that speak these languages, not just in the process of creating their machine-translation systems, but also as those systems are used, to reflect how real people use those languages. Researchers who Nature spoke to say that they are concerned that not doing so will hasten the demise of the languages and, by extension, their associated cultures. Without continued engagement, working on machine translation could become another form of ‘parachute science’, in which researchers in high-income countries exploit communities in low-income countries.

“The words, the sentences, the communication, are void of the values and beliefs encoded in the languages,” says Sara Child, a specialist in language revitalization at North Island College on Vancouver Island in Canada and a member of the Kwakwaka’wakw people. As AI propels more languages into the digital space, “I worry that we lose even more of ourselves”. This human element must not be ignored in the rush towards a universal translation system.

[ad_2]

Source link

June 5, 2024
Scaling neural machine translation to 200 languages
[ad_1]
Data

This section describes the steps taken to design our language identification system and bitext mining protocol.

Language identification

To train language identification models, we used fasttext^33,51, which has been widely used for text classification tasks because of its simplicity and speed. We embedded character-level n-grams from the input text and leveraged a multiclass linear classifier on top. The lightweight nature of fasttext enables our LID models to handle web-scale data. Furthermore, a linear model has the benefit of being easily explainable, allowing us to trace any classification error back to its root cause. This is instrumental in addressing common pitfalls that arise when detecting language on web corpora³².

Classifier design

We experimented with two different designs. First, we used a combination of multiple binary classifiers in which the final decision was obtained by selecting the language with the highest score after applying a threshold. We applied threshold optimization so that when the confidence of a classifier is low, the corresponding language is not considered for the final decision. A sentence was filtered out if none of the classifiers surpassed its threshold. Second, we built a multiclass classifier using softmax over all possible languages. In this case, the threshold optimization is done after the softmax.

Our results directed us to focus on the second approach, which offers several advantages. First, changing the threshold for one language did not affect the performance of the other (which is not true in the first setting). Second, this approach generalizes better to out-of-domain data, which is our primary use case (Wikipedia → web data). Finally, a single classifier has the added benefit of being computationally simpler, thus streamlining the language identification process.

Training data and handling massive class imbalance

We used publicly available datasets to train our LID system, partially covering our languages of interest. The public datasets deployed were mostly built from web pages such as CommonCrawl. We then supplemented these with NLLB-Seed data (Supplementary Information B) for any missing languages. However, this supplementation is insufficient in ensuring balance in the raw training data^32,30. For example, English alone represents 10.1% of our training data, whereas Minangkabau (Latin script) represents only 0.06%. Following ref. ¹⁰, we experimented with multiple settings of temperature upsampling for underrepresented languages, in which sentences from a language l representing p_l per cent of the data set are sampled proportionally to \({p}_{l}^{1/T}\). Optimal performance was obtained at 1/T = 0.3 (for more details, see section 5.1 of ref. ³⁴).

Training parameters

Our best-performing model was trained with softmax loss over two epochs with a learning rate of 0.8 and embeddings with 256 dimensions. We discarded words with less than a thousand occurrences after upsampling and selecting a minimum and maximum character n-gram length of two and five, respectively (which were assigned a slot in buckets of size 1,000,000). (In fasttext, we refer to ‘word’ when it is separated by spaces. When it is a non-segmenting language, there is only one ‘word’ for the whole sentence (and we take character n-grams)). All hyperparameters were tuned on FLORES-200 dev (see section 5.1.2 of ref. ³⁴).

Improving LID with linguistic analysis

Language identification is a challenging task in which numerous failure modes exist, often exacerbated by the gaps between the clean data on which LID models are trained and noisy data on which LID models are applied. In other words, LID models trained in a supervised manner on fluently written sentences may have difficulty identifying grammatically incorrect and incomplete strings extracted from the web. Furthermore, models can easily learn spurious correlations that are not meaningful for the task itself. Given these challenges, we collaborated closely with a team of linguists throughout different stages of LID development to identify proper focus areas, mitigate issues and explore solutions (see section 5.1.3 of ref. ³⁴).

Bitext mining

The overall approach for bitext mining focused on starting with a massively multilingual sentence encoder teacher model and adapting it to several different low-resource student models. This approach enabled us to add low-resource languages without competing with high-resource languages for capacity. Doing so circumvents the need to retrain the entire model from scratch while maintaining compatibility with the multilingual embedding spaces for subsequent mining. Extended data Fig. 1 summarizes the overall architecture of the teacher–student approach. The teacher, LASER2, is an improved version of the open-source LASER encoder (https://github.com/facebookresearch/LASER). The original training procedure³⁶ was adapted to include SentencePiece tokenization (including a vocabulary of 7,000 tokens) and the upsampling of low-resource languages.

The architecture of the five-layer BiLSTM encoder and the max pooling method to obtain sentence embeddings were left unchanged. The training was then performed on the same 93 languages with public resources obtained from OPUS⁵². See ref. ³⁶ for details on the original LASER training procedure. Training of the students followed the approach described in greater detail in ref. ²¹, summarized below:
- students specialized in one language or several similar languages;
- students were randomly initialized because we wanted to handle low-resource language for which we did not have a pre-trained language model;
- students may have a dedicated SentencePiece vocabulary different from the teacher to better accommodate scripts and tokens in the student languages;
- as we used cosine distance for bitext mining (Fig. 1), students learnt to minimize the cosine loss with the teacher;
- students can have an MLM loss to leverage student language monolingual data (Fig. 1).
Training parameters

Our student encoders used a 12-layer transformer with a hidden size of 1,024, four attention heads, and around 250 million parameters. All students were trained with available bitexts in their respective language, complemented by 2 million sentences of English/English and English/Spanish. The motivation behind this approach is to anchor the students to the English embedding space, increasing robustness by including English/Spanish bitexts from CCMatrix and allowing for the joint learning of new languages. This technique is particularly useful when only limited amounts of bitexts are available to train the students. Teacher–student training was performed on 16 GPUs, the ADAM optimizer, a learning rate of 0.0005 and a batch size of 10,000. We trained student encoders for 148 languages and named these models LASER3.

Proxy metric for new encoders

Mined bitexts were subsequently used to improve translation quality for the languages of NLLB-200. However, mining and NMT training are computationally expensive, and it is intractable to perform this evaluation systematically for many different sentence encoder variants. As an evaluation proxy, we used a mining-based multilingual similarity search error rate, referred to here as xsim. In contrast to cosine accuracy, which aligns embeddings based on the highest cosine score, xsim aligns source and target embeddings based on the highest margin score, which has been shown to be beneficial in mining⁵³. The margin-based score is defined as

$${\rm{score}}(x,y)={\rm{margin}}\left(\cos (x,y),\sum _{z\in N{N}_{k}(x)}\frac{\cos (x,z)}{2k}+\sum _{v\in N{N}_{k}(\,y)}\frac{\cos (\,y,v)}{2k}\right)$$

(1)

where x and y are the source and target sentences, and NN_k(x) denotes the k nearest neighbours of x in the other language. We set k to 4. All xsim results are calculated on FLORES-200 devtest, using the ratio margin, where margin(a, b) = a/b. Moreover, all scores are calculated for translations into English (that is, xxx → eng). English is encoded by the teacher, and the other language is encoded by the LASER3 student. To facilitate further research using xsim, we also provide this evaluation method as an open-source resource (https://github.com/facebookresearch/LASER/).

End-to-end encoder evaluation

Once we had identified the best sentence encoder for each language using the xsim scores, we performed mining, added the mined data to the existing bitexts and trained a bilingual NMT system. Initial experiments indicated that a threshold on the margin of 1.06 seems to be the best compromise between precision and recall for most languages. For these NMT baselines, we do not apply extra filtering on the bitexts and leave this to the training procedure of our massively multilingual NMT system.

We did not attempt to optimize the architecture and parameters of the bilingual NMT systems to the characteristics of each language pair but used the same architecture for all. Therefore, the reported results should not be interpreted as the best possible ones given the available resources—they are mainly provided to validate the mined bitexts. We used a 12-layer encoder and decoder and trained for 100 epochs. Moreover, we looked for the best performance on the FLORES-200 development set and report detokenized BLEU on the FLORES-200 devtest.

Modelling

In this section, we first describe the multilingual machine translation task setup, which includes tokenization and base model architecture. Then, we outline how we leveraged conditional computation for massively multilingual machine translation with EOM regulation and our Curriculum Learning (CL) strategy for low-resource languages.

Task setup

We modelled multilingual NMT as a sequence-to-sequence task, in which we conditioned on an input sequence in the source language with an encoder and generated the output sequence in the expected target language with a decoder⁵⁴. With the source sentence S, source language ℓ_s, and target language ℓ_t in hand, we trained to maximize the probability of the translation in the target language T—that is, P(T∣S, ℓ_s, ℓ_t). Below, we discuss details of the (1) tokenization of the text sequences in the source and target languages; and (2) model architecture with the input and output designed specifically for multilingual machine translation. For further details on the task setup, such as the amount of training data per language pair, please refer to Supplementary Information F or section 8 of ref. ³⁴.

Segmentation with SentencePiece

To tokenize our text sequences, we trained a single SentencePiece model (SPM)⁵⁵ for all languages. We sampled a total of 100 million sentences from primary bitext data. To ensure low-resource languages are well-represented in the vocabulary, we downsampled high-resource and upsampled low-resource languages with a sampling temperature of five (ref. ¹⁰). Notably, vocabulary size is an important hyperparameter in multilingual translation models involving low-resource languages^56,57,58. The vocabulary size of our trained SPM model is 256,000. Such a large vocabulary ensures adequate representation across the wide spectrum of languages we support.

Model architecture

Our sequence-to-sequence multilingual machine translation model is based on the transformer encoder–decoder architecture⁵⁹. The encoder transforms the source token sequence into a sequence of token embeddings. Then, the decoder attends to the encoder output and autoregressively generates the target sentence token by token. More precisely, the encoder takes the sequence of tokens W = (w₁, …, w_S) and the source language ℓ_s, and produces a sequence of embeddings H = (h₁, …, h_S), which are then provided to the decoder with the target language ℓ_t to produce the target tokens V = (v₁, …, v_T) sequentially. In sum,

$$H={\rm{encoder}}(W,\,{{\ell }}_{{\rm{s}}}),$$

(2)

$$\forall i\in [1,\ldots ,T],\,{v}_{i+1}={\rm{decoder}}(H,\,{{\ell }}_{{\rm{t}}},\,{v}_{1},\,\ldots ,\,{v}_{i}).$$

(3)

Note that we prefixed the source sequence with the source language, as opposed to the target language, as done in previous work^10,60. We did so because we prioritized optimizing the zero-shot performance of our model on any pair of 200 languages at a minor cost to supervised performance. Empirically, we find zero-shot performance to be negatively affected when conditioning the encoder on the target language. When the source is conditioned on only the source language, the encoder generalizes better to pairs of source and target languages not encountered during training¹.

Conditional computation for multilingual machine translation

A massively multilingual translation (MMT) model uses the same shared model capacity to train on several translation directions simultaneously. While doing so can lead to beneficial cross-lingual transfer between related languages, it can also add to the risk of interference between unrelated languages^1,61. MoE models are a type of conditional computational models^62,63 that activate a subset of model parameters per input, as opposed to dense models that activate all model parameters per input. MoE models unlock marked representational capacity while maintaining the same inference and training efficiencies in terms of FLOPs compared with the core dense architecture.

However, as we increase the model capacity and the computational cost per update, the propensity for low or very low-resource languages to overfit increases, thus causing performance to deteriorate. In this section, we examine how we can use Sparsely Gated Mixture of Experts models^2,3,4,5,6,7 to achieve a more optimal trade-off between cross-lingual transfer and interference and improve performance for low-resource languages.

Sparsely gated mixture of experts

To build our MoE models, we substitute a quarter of the encoder and decoder feed-forward network layers with MoE layers, each with E distinct experts. We followed the Top-k-Gating algorithm in ref. ⁴ and dispatched each token to at most k = 2 experts. For more details on the training of MoE models, see Supplementary Information E.

Expert output masking

In this proposed regularization strategy, we masked the expert output for a random fraction (p_eom) of the input tokens. For input tokens with dropped expert outputs, the first and/or second expert is effectively skipped. As shown in the second panel of Extended data Fig. 2, we masked both experts for the first token (x₁ in red), chose not to mask any of the expert outputs for the second token (x₂ in blue) and in the final scenario, masked only one expert for the last token (x₃ in green).

Curriculum learning for MMT

Orthogonal to model-side regularization methods such as dropout, we explored regularizing MMT models by means of CL. We proposed starting training with high-resource pairs first, then introducing low-resource pairs—prone to overfitting—in later phases. To derive the phases of the curriculum, we first trained a vanilla MoE model (without CL), followed by partitioning the translation directions into n bins {b₁, …, b_n}. If T is the total number of training updates, we introduced each bin b_i after T − k_i updates. We based when \({({k}_{i})}_{i}\) and what \({({b}_{i})}_{i}\) directions to add at every phase of the step when we observed a language pair starting to overfit. Review the step-based CL algorithm in ref. ⁶⁴ for more on how the directions are partitioned. See Supplementary Information E.2 for the list of directions added at each stage.

Evaluations

Automatic evaluation

Many automatic translation quality assessment metrics exist, including model-based ones such as COMET⁶⁵ and BLEURT⁶⁶. Although model-based metrics have shown better correlation with human judgement in recent metrics shared tasks of the WMT⁴³, they require training and are not easily extendable to a large set of low-resource languages. In this work, we rely on BLEU (and a variant of it) and chrF++. Both measures draw on the idea that translation quality can be quantified based on how similar a machine translation output is compared with that produced by a human translator.

BLEU and spBLEU

The BLEU score⁴⁴ has been the standard metric for machine translation evaluation since its inception two decades ago. It measures the overlap between machine and human translations by combining the precision of 1-grams to 4-grams with a brevity penalty. The main disadvantage of BLEU is that it is tokenization-dependent. Efforts such as sacrebleu⁶⁷ have taken strides towards standardization, supporting the use of community-standard tokenizers under the hood. However, these tokenizers do not extend to many languages. Reference ⁴¹ proposes spBLEU, a BLEU metric based on a standardized SentencePiece model (SPM) covering 101 languages, released alongside FLORES-101. In this work, we provide SPM-200 along with FLORES-200 to enable the measurement of spBLEU. (Our analyses demonstrate that there are minor differences between SPM-200 from FLORES-200 and SPM-100 from FLORES-101 when measuring on the FLORES-101 languages. The major advantage of SPM-200 is that it covers 200 languages. More details on SPM-200 are reported in section 8.1.1 of ref. ³⁴).

chrF++

The chrF++ score³⁸ overcomes the limitation of the BLEU score, which requires that a sentence can be broken up into word tokens. However, some languages, such as Chinese or Thai, do not use spaces to separate words, and word segmentation tools may not be readily available. There is also a concern about highly agglutinative languages in which BLEU fails to assign any credit to morphological variants. chrF++ overcomes these weaknesses by basing the overlap calculation on character-level n-grams F-score (n ranging from 1 to 6) and complementing with word unigrams and bi-grams. In this work, we primarily evaluated using chrF++ using the settings from sacrebleu. However, when comparing with other published work, we used BLEU and spBLEU where appropriate.

Human evaluation methodology

When building machine translation systems for thousands of different language pairs, a core question is which pairs reach certain levels of quality. Therefore, we needed meaningful scores that are comparable across language pairs.

XSTS evaluation protocol

We adapted the recently proposed XSTS methodology⁴⁸. In short, XSTS is a human evaluation protocol focusing on meaning preservation above fluency. See details on this protocol in Supplementary Information F. For low-resource languages, translations are usually of poorer quality, and so we focused more on usable (that is, meaning-preserving) translations, even if they are not fully fluent. Compared with Direct Assessment⁶⁸ with a 5-point scale (the original direct assessment uses a 100-point scale), it is found that XSTS yields higher inter-annotator agreement⁴⁷. XSTS rates each source sentence and its machine translation on a 5-point scale, in which 1 is the lowest and 5 is the highest.

Calibration set

To enable meaningful scores comparable across language pairs, we asked each evaluator to provide assessments using the XSTS scale on precisely the same set of sentence pairs. This aims to identify annotators who have a systematic tendency to be more harsh or generous in their scoring and correct for this effect. The calibration set consists of the machine translation output paired with the reference translation only in English. Based on how evaluators used the XSTS scale on this calibration set, we adjusted their raw scores on the actual evaluation task to ensure consistency across evaluators. Although this monolingual calibration task does not precisely mimic the bilingual XSTS task, it is a reasonable first approximation and has been shown to increase the correlation between human and automatic metrics primarily by reducing one source of ‘noise’ in the human evaluations—the lack of score calibration between annotators.

Obtaining aggregated human quality metrics from multiple studies

To obtain an aggregate human quality metric for each language direction in an evaluation study, we take the majority XSTS score (that is, mean–median score) for each sentence and average these majority scores over all evaluated sentences. In a given study, the aggregate human evaluation score for any translation direction l_s → l_t is

$${H}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}}}=\frac{1}{| {{\mathcal{T}}}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}}}| }\sum _{(S,T)\in {{\mathcal{T}}}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}}}}{\rm{median}}\{{X}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}},i}(S,T)| 1\le i\le {M}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}}}\},$$

(4)

where l_s and l_t denote the source language and the target language, respectively; \({X}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}},i}(S,T)\) denotes the XSTS score of the ith evaluator who evaluates sentences in a given translation direction l_s → l_t for a source sentence S and a target sentence T; \({M}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}}}\) denotes the total number of evaluators who evaluate the (source, translation) sentence pair (S, T) for translation direction l_s → l_t; \({{\mathcal{T}}}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}}}=\{({S}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}},k},{T}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}},k})| 1\le k\le {N}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}}}\}\) is the set of \({N}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}}}\) (source, translation) sentence pairs being evaluated for translation direction l_s → l_t.

Every evaluator in a given study s is also asked to provide ratings for all or parts of a calibration set—\({{\mathcal{C}}}_{s}=\{({S}_{s,k},{T}_{s,k})| 1\le k\le {K}_{s}\}\). S_s,k denotes the kth source sentence in the calibration set for an evaluation study; s, T_s,k denotes the translated sentence corresponding to S_s,k; and \({K}_{s}=| {{\mathcal{C}}}_{s}| \) is the number of sentence pairs in the calibration set for an evaluation study.

For each language direction evaluated in a study, we obtained the majority score on the calibration set as follows:

$${C}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}}}^{(s)}=\frac{1}{| {{\mathcal{C}}}_{s}| }\sum _{(S,T)\in {{\mathcal{C}}}_{s}}{\rm{median}}\{{X}_{l,i}^{(s)}(S,T)| 1\le i\le {M}_{{l}_{{\rm{s}}}\to {l}_{{\rm{t}}}}^{(s)}\},$$

(5)

where \({X}_{l,i}^{(s)}(S,T)\) denotes the XSTS score provided by the ith evaluator, for the language direction l_s → l_t, in study s, for a given source sentence S and a translated sentence T, in the calibration set \({{\mathcal{C}}}_{s}\) of the study.

To obtain aggregated calibrated XSTS scores on the language direction level, we explored several different calibration methodologies. None of the calibration methods we investigated showed a marked difference in correlation with automated scores, and all calibration methodologies we explored provided superior correlation compared with uncalibrated XSTS scores. For more details on these calibration methodologies, see section 7.2 of ref. ³⁴.

Added toxicity detection for 200 languages

To enable toxicity detection at scale, we used a detector based on word lists. In this section, we provide more details about our toxicity definition and describe the detector (ETOX) and associated word lists.

Toxic content

Owing to the subjective nature of toxicity, definitions of toxic language can vary. We included items that are commonly referred to as vulgar or profane language. (Note that vulgar or profane language is not always necessarily toxic. Some common slang, for instance, may be considered vulgar but is not necessarily toxic). Moreover, we also included items associated with depictions of pornographic content or sexual acts, some frequently used hate speech expressions and some expressions tied to bullying. We also included items, vulgar or not, referring to body parts that are commonly associated with sexual practices.

The ETOX detector

We started with the assumption that general-purpose machine translation systems should remain faithful to the source content and not add any toxic elements during the translation process. We define toxic elements as word tokens or short phrases present in our lists. ETOX identifies added toxicity using the following two criteria: number of toxic items and matched or non-matched toxicity. A toxic item is considered detected if it is present in a line and surrounded by spaces or the start or end of a line. ETOX tracks the number of unique toxic items found in a line but does not count a phrase again if it has multiple occurrences. Matched toxicity indicates that the number of toxic items is the same in both the source and the translated content (that is, no added toxicity). Added toxicity is an instance of non-matched toxicity in which more toxic items are found in the translation output than in the source. For non-segmenting languages or some languages that use complex diacritics, space tokenization is insufficient to distinguish words from one another. In those cases, we used SentencePiece tokenization of both the sentence and toxicity word list.

Toxicity-200 lists

Lists are based on professional translations from English, which were then heuristically adapted by linguists to better serve the target language. As toxicity is culturally sensitive, attempting to find equivalents in a largely multilingual setting constitutes a challenge when starting from one source language. To address this issue, translators were allowed to forgo translating some of the source items and add more culturally relevant items.

In the initial release of the Toxicity-200 lists, the average number of items in a toxicity detection list was 271 entries, whereas the median number of entries was 143. The latter may be a better measure of central tendency than the mean average, given that languages with a rich inflectional morphology constitute extreme outliers (for example, the Czech list had 2,534 entries and the Polish list 2,004). The shortest list had 36 entries, and the longest 6,078.
[ad_2]

Source link
June 5, 2024
First pig-to-human liver transplant recipient ‘doing very well’

[ad_1]

The surgery to implant a genetically modified pig’s liver into the 71-year-old man lasted 8 hours.Credit: David Tadevosian/Shutterstock

A 71-year-old man in China has become the first living person to receive a liver transplant from a genetically modified pig — and the fifth person reported to have received a pig organ. More than two weeks after the surgery, the man is “doing very well”, says Sun Beicheng, a surgeon at the First Affiliated hospital of Anhui Medical University who led the transplantation.

The surgeons have not released many details about the procedure, but researchers are encouraged by the apparent success. “It is very exciting news,” says Burcin Ekser, a transplant surgeon at Indiana University School of Medicine in Indianapolis.

The liver is the latest in a series of pig organs introduced to people. Since early 2022, surgeons have transplanted pig hearts, kidneys and a thyroid into four people. Three died in the months after receiving their transplants, and researchers say their pre-existing poor health, which contributed to their selection as transplant candidates, makes it difficult to determine whether the transplants were a factor. One person who was operated on in mid-April is still alive today.

The transplants have allowed researchers to gain valuable insights into the feasibility of xenotransplantation — the transfer of organs from one species to another. Clinicians hope the technology might one day supply organs for the thousands of people who die waiting for a donor organ each year.

Xenotransplantation of livers has experienced a surge this year. In January 2024, a US team connected a genetically modified pig liver outside the body of a clinically dead person. In March, Kefeng Dou, a transplant surgeon at Xijing Hospital of the Air Force Medical University in Xi’an and his colleagues transplanted a genome-edited pig liver into a clinically dead individual for 10 days, as agreed with the man’s family, and saw no signs of rejection. And earlier in May, another team in China transplanted a pig kidney and liver into a clinically dead person.

Right lobe

In the most recent pig-organ transplant, the recipient had a large tumour on the right lobe of his liver, which had not yet spread to other parts of the body. The individual was not eligible to receive a human liver transplant because tests indicated that his liver was functioning too poorly to ensure a good outcome, and his left lobe alone would not be able to keep him alive, says Sun. The doctors didn’t know “when the tumour would rupture”, he says. The situation was “very dangerous”. With few other options, Sun says the patient and his family expressed interest in the xenotransplant. The surgery team say they obtained approval from their hospital’s ethics and transplantation committees on compassionate grounds.

On May 17, in an operation that lasted eight hours, surgeons removed the individual’s right lobe. They replaced it with a 514-gram liver from an 11-month-old miniature pig, weighing 32 kilograms.

The pig had ten genetic modifications to prevent its organs from being rejected soon after being transplanted, says Hong-Jiang Wei at Yunnan Agricultural University, in Kunming, whose team developed the pig. The team deactivated three genes that contribute to the production of sugars on the surface of pig cells, which the human immune system attacks, and introduced seven genes that express human proteins.

Sun says that in tests of the pig liver, they did not detect the presence of porcine cytomegalovirus, which could have contributed to complications in a recipient of a pig heart, who died two months after the procedure.

Save or support

Once the surgeons had re-established blood flow to the transplanted pig liver, it instantly began to secrete bile. From 10 millilitres on the first day, bile production gradually increased to between 200–300 mL on day 13 (a healthy person secretes at least 400 mL of bile a day). Sun says that he has not seen signs of the organ being rejected, including from a biopsy conducted on day 12. “He has normal liver function,” says Sun.

“That is a very positive result,” says Jay Fishman, a specialist in transplant infectious disease at Massachusetts General Hospital in Boston. “In general you don’t see those kinds of good signs if the organ is suffering rejection.”

Livers tend to experience less rejection and injury than a kidney, heart or lung, says Fishman. Although he cautions that signs of chronic rejection could appear later.

In addition to bile, Sun says the pig liver is producing pig versions of albumin and coagulation factors. From the way these essential proteins function, “we may learn a great deal”, says David Cooper, a xenotransplant immunologist at Massachusetts General Hospital in Boston. If the researchers identify that the pig versions of these proteins do not serve the needs of the recipient, future transplants might genetically manipulate the pigs to produce the human versions.

At day 10, Sun says the team had not yet seen signs of liver growth, but that they remain optimistic. He says they hope that ultimately the person’s left lobe will grow large enough to provide full liver function and that the pig liver will serve as a bridge to get to that point.

[ad_2]

Source link

May 31, 2024
The immune system can sabotage gene therapies — can scientists rein it in?

[ad_1]

A child receives a gene therapy for Duchenne muscular dystrophy.Credit: Elisabeth Schneider Charpentier/Look At Sciences via Science Photo Library

When Donavon Decker volunteered for a trial of a gene therapy, it wasn’t for his own benefit. Decker has a genetic muscle disorder, but the trial aimed to assess only the therapy’s safety, not its effectiveness. And the experimental treatment — a virus that would shuttle a healthy gene into his cells — would be injected into a muscle in his foot and was not expected to travel much farther.

What’s more, his immune response to the virus might rule out future treatments: an assault mounted by his immune system on the virus it could not only disable the therapy but also harm Decker.

Decker thought of his family — he had four sisters and two nieces with the same condition, limb-girdle muscular dystrophy — and enlisted anyway. And, he thought, scientists would eventually work out a way to quench immune responses to the virus, giving people like him access to future gene therapies.

Gene therapy is facing its biggest challenge yet

Nearly a quarter of a century later, that has not happened. “It’s a big disappointment to me,” he says. “I really didn’t think I was going to be here 25 years later and still not be able to be re-dosed.”

The field of gene therapy has blossomed over the past decade, generating a stream of official approvals for various treatments and a burgeoning pipeline of clinical trials. But the inability to administer more than one dose of a virus carrying restorative genes limits what gene therapy can do. At the American Society of Gene and Cell Therapy annual meeting in Baltimore, Maryland, on 7–11 May, researchers presented myriad potential ways of overcoming the problem, from suppressing immune responses to cloaking the virus or leaving it out altogether.

“This is a huge issue for the field,” says Martin Kang, who develops gene therapies for respiratory conditions at the Medical University of South Carolina in Charleston.

Dosing dangers

The need for a solution has become clearer as researchers have learnt more about gene therapy. Long-term data show that the effects of some gene therapies wane over time¹; others might need to be given in multiple doses to provide a significant benefit even in the short term. And many people are ineligible to participate in clinical trials at all because of previous exposure to adeno-associated viruses (AAV), relatively harmless viruses that are used in many gene therapies and that circulate in the environment.

“These are the new heartbreaks in the rare-disease community,” says Annie Kennedy, chief of policy, advocacy and patient engagement at the EveryLife Foundation for Rare Diseases in Washington DC. “There’s now this new measure that you have no control over: whether or not you have a pre-existing antibody.”

Studies in multiple countries have estimated that 30–70% of the population has antibodies that can neutralize AAV. Some families, eager to enrol a loved one in a clinical trial, will choose to self-isolate for years to minimize the risk of exposure to AAV.

Taming side effects

Researchers working on mice have spent years been looking for drugs that can prevent immune responses to gene therapy. Some have been trying medications that prevent rejection after organ transplants. Others are attempting to dampen the activity of antibody-producing cells called B cells.

But so far, the results have been disappointing. “There’s a ton of work in this space,” says Lindsey George, a paediatrician at the University of Pennsylvania in Philadelphia. “But I haven’t seen anything that’s really viable coming out.”

One problem might be the intense focus on B-cell responses, says Kang, because other immune cells called T cells are also capable of remembering past encounters with viruses. “T-cell responses are absolutely critical,” he says. “They might play a larger role than people realize.”

One–two punch

At the Baltimore meeting, researchers presented the results of animal studies suggesting that more effective methods might be on the horizon. Nicholas Giovannone, an immunologist at Regeneron in Tarrytown, New York, described antibodies that bind to and block an important protein called CD40 used by both B cells and T cells. Mice given the antibody before receiving AAV had levels of antibodies against the virus that were indistinguishable from those of mice that had not been given AAV. “I’ve never seen anything like it before,” Giovannone said. “We think this might be a one–two punch where we can tackle both the B- and T-cell response.”

‘It’s a vote for hope’: first gene therapy for muscular dystrophy nears approval, but will it work?

Kang and his colleagues have also been trying to mute T-cell responses since finding that their experimental gene therapy for a genetic lung disorder called surfactant protein B deficiency might need to be readministered to achieve long-term benefits. At the meeting, Kang reported results from his team’s efforts to suppress T-cell and other immune responses to AAV by inserting certain genetic sequences into the virus. They found that one dose of this enhanced gene therapy suppressed some immune responses against AAV in mice — but not all.

To their surprise, a second dose of the gene therapy was nevertheless effective against the respiratory ailment. It’s a mystery why the approach worked despite the residual immune responses, says Kang, but might have something to do with the fact that the therapy was administered directly into the lungs, rather than the bloodstream.

As is often the case in medicine, it might ultimately take a combination of approaches to achieve re-dosing of gene therapies, says Julie Crudele, a gene-therapy researcher at the University of Washington in Seattle. “The answer is likely to be a cocktail.”

Others are focusing on alternatives to AAV. At the meeting, Chris Wright, head of translational research at Ring Therapeutics in Cambridge, Massachusetts, presented data showing that a class of viruses called anelloviruses can evade detection by the mouse immune system, can shuttle DNA into mouse cells and can be administered multiple times safely.

And many researchers are working on non-viral alternatives, such as fatty particles that can carry DNA or RNA into cells, similar to those used in mRNA vaccines against COVID-19.

Long wait

Decker has decided to take matters into his own hands and is raising money to launch a company focused on non-viral methods of gene therapy. Last time he was tested for AAV antibodies, 14 years after his clinical trial, he was still positive.

Despite his frustration, Decker does not regret his decision to participate in the clinical trial 25 years ago. Two weeks after he was treated, the death of a teenager named Jesse Gelsinger in another gene-therapy study sent the field spinning. It would take years to right itself, and Decker is grateful that he was able to contribute to data that might have helped the field to progress even during turbulent times.

“The only reason, in my opinion, that gene therapy is even possible today is because of the trial I was in,” he says.

[ad_2]

Source link

May 28, 2024
Guidelines for academics aim to lessen ethical pitfalls in generative-AI use

[ad_1]

New guidelines aim to safeguard researchers and study participants from AI risks.Credit: J Studios/Getty

A new toolkit to help academics to use generative artificial intelligence (genAI) more ethically is being developed by researchers in the United Kingdom.

“Generative AI is so new, we just don’t have any guidance,” says Wendy Moncur, a cybersecurity researcher at the University of Strathclyde in Glasgow, UK, who is leading the project. Academics are already considering the potential quandaries with use of genAI tools, she says, “but wouldn’t it be a useful thing, if they had a little checklist to say, ‘These are the things you need to think about; these are the strengths; and these are the threats.’”

Nature Index 2023 annual tables

The project focuses on issues that might arise when genAI tools — such as ChatGPT, made by OpenAI in San Francisco, California, and Google’s Gemini, which are powered by large language models (LLMs) — are used to analyse and process personal information from study volunteers.

It was inspired by an ongoing study, led by Moncur, that is looking into how people going through major life transitions — such as being diagnosed with cancer or undergoing gender reassignment — can manage their privacy online.

In the work, Moncur and her team are using genAI tools to create teaching materials, on the basis of participants’ stories, that are intended to guide others through similar life changes.

The participants had shared details about their experiences — such as how their work and relationships were affected — under the assurance that the information would be shared with others only in an anonymized form. But before the team started feeding this information into a genAI program, Moncur suddenly feared that, if the tool pieced together publically available information with the anonymized data that it was being fed, the participants might accidentally be reidentifiable.

The team was also concerned about LLMs’ tendency to ‘hallucinate’ — generating nonsensical or incorrect information — which could potentially slander reidentified participants. And LLMs can change the meaning of the information fed into them, because they are influenced by social and other biases inherent in their design. For example, Moncur says the program that her team used would distort what the participants had said, making their stories more positive than the participants had intended. “ChatGPT has a bit of a ‘Pollyanna thing’ going on, in that it doesn’t like unhappy endings,” says Moncur. “So, it needs a bit of a nudge to produce a credible story.”

Outlining the issues

Moncur’s concerns prompted her to team up with computer scientists Ali Farooq and Ryan Gibson at the University of Strathclyde and Burkhard Schafer, a legal scholar at the University of Edinburgh, UK, to collaborate on solutions. Funded by the UK National Research Centre on Privacy, Harm Reduction and Adversarial Influence Online, they launched a ten-month project to develop guidelines for researchers and university ethics committees, due to be completed in August.

In March, the European Commission’s European Research Area Forum released guidelines on the responsible use of AI, which will feed into the work that Moncur and her team are doing.

Moncur says the project has three main objectives: to address the lack of expertise in identifying privacy risks caused by using genAI in research; to address data-management requirements in UK research, many of which don’t account for the growing use of genAI; and to address the legal risks for institutions that are using genAI to analyse or process participant data.

The project is designed to look at AI use in research broadly, but will include focus areas, such as how to protect privacy when using AI to process medical data, says Farooq.

The team is doing a literature review to characterize how researchers are using genAI to handle personal data, and is planning to interview academics who serve on ethics committees at UK universities.

Informed by the insights from these projects, the team will develop a toolkit based on analysis of strengths, weaknesses, opportunities and threats, which ethics committees and researchers can consult when they are reviewing or planning projects that will involve genAI technologies. The team plans to make this tool freely available online.

Much-needed guidance

Robert Davison, an information-systems scientist at the City University of Hong Kong, welcomes these efforts to create more-robust ethical oversight for genAI use. “It’s highly likely that it will become normal [to use this technology],” says Davison. But he recalls a point made in an editorial published in January¹, which he co-authored: “We do not wish to see a situation where we are lulled into thinking that genAI use is ‘normal’, and that researchers do not need either to pay particular attention to it, or to report their use of it.”

Davison is keen to see ethical norms be established around genAI use, but is wary of a siloed approach to setting these standards. Broader ethical standards would be ideal, he says, but adds that it’s unclear who would be best placed to provide — and enforce — such guidelines.

For now, Moncur and her colleagues will target university ethics committees. “Researchers are under such pressure to be efficient — they’re overloaded,” says Moncur. “If you’ve got a tool [such as AI] that’s going to make things more efficient, then it makes sense to use the tool. But we need information to help us use the tools responsibly, and in a way that allows us to do good science.”

Nature Index’s news and supplement content is editorially independent of its publisher, Springer Nature. For more information about Nature Index, see the homepage.

[ad_2]

Source link

May 22, 2024
Protesters Are Fighting to Stop AI, but They’re Split on How to Do It

[ad_1]

Would it be too disruptive if protests staged sit-ins or chained themselves to the doors of AI developers, one member of the Discord asked. “Probably not. We do what we have to, in the end, for a future with humanity, while we still can.”

Meindertsma had been worried about the consequences of AI after reading Superintelligence, a 2014 book by philosopher Nick Bostrom that popularized the idea that very advanced AI systems could pose a risk to human existence altogether. Joseph Miller, the organizer of PauseAI’s protest in London was similarly inspired.

It was the launch of OpenAI’s large language model Chat-GPT 3 in 2020 that really got Miller worried about the trajectory AI was on. “I suddenly realized that this is not a problem for the distant future, this is something where AI is really getting good now,” he says. Miller joined an AI safety research nonprofit and later became involved with PauseAI.

Bostrom’s ideas have been influential in the “effective altruism” community, a broad social movement that includes adherents of long-termism: the idea that influencing the long-term future should be a moral priority of humans today. Although many of PauseAI’s organizers have roots in the effective altruism movement, they’re keen to reach beyond philosophy and garner more support for their cause.

Director of Pause AI US, Holly Elmore, wants the movement to be a “broad church” that includes artists, writers, and copyright owners whose livelihoods are put at risk from AI systems that can mimic creative works. “I’m a utilitarian. I’m thinking about the consequences ultimately, but the injustice that really drives me to do this kind of activism is the lack of consent” from companies producing AI models, she says.

“We don’t have to choose which AI harm is the most important when we’re talking about pausing as a solution. Pause is the only solution that addresses all of them.”

Miller echoed this point. He says he’s spoken to artists whose livelihoods have been impacted by the growth of AI art generators. “These are problems that are real today, and are signs of much more dangerous things to come.”

One of the London protesters, Gideon Futerman, has a stack of leaflets he’s attempting to hand out to civil servants leaving the building opposite. He has been protesting with the group since last year. “The idea of a pause being possible has really taken root since then,” he says.

Futerman is optimistic that protest movements can influence the trajectory of new technologies. He points out that pushback against genetically modified organisms was instrumental in turning Europe off of the technology in the 1990s. The same is true of nuclear power. It’s not that these movements necessarily had the right ideas, he says, but they prove that popular protests can stymie the march even of technologies that promise low-carbon power or more bountiful crops.

In London, the group of protesters moves across the street in order to proffer leaflets to a stream of civil servants leaving the government offices. Most look steadfastly uninterested, but some take a sheet. Earlier that day Rishi Sunak, the British prime minister who six months earlier had hosted the first AI Safety Summit, had made a speech where he nodded to fears of AI. But after that passing reference, he focused firmly on the potential benefits.

The Pause AI leaders WIRED spoke with said they were not considering more disruptive direct action such as sit-ins or encampments near AI offices for now. “Our tactics and our methods are actually very moderate,” says Elmore. “I want to be the moderate base for a lot of organizations in this space. I’m sure we would never condone violence. I also want Pause AI to go further than that and just be very trustworthy.”

Meindertsma agrees, saying that more disruptive action isn’t justified at the moment. “I truly hope that we don’t need to take other actions. I don’t expect that we’ll need to. I don’t feel like I’m the type of person to lead a movement that isn’t completely legal.”

The Pause AI founder is also hopeful that his movement can shed the “AI doomer” label. “A doomer is someone who gives up on humanity,” he says. “I’m an optimistic person; I believe we can do something about this.”

[ad_2]

Source link

May 13, 2024
OpenAI Is ‘Exploring’ How to Responsibly Generate AI Porn

[ad_1]

OpenAI released draft documentation Wednesday laying out how it wants ChatGPT and its other AI technology to behave. Part of the lengthy Model Spec document discloses that the company is exploring a leap into porn and other explicit content.

OpenAI’s usage policies curently prohibit sexually explicit or even suggestive materials, but a “commentary” note on part of the Model Spec related to that rule says the company is considering how to permit such content.

“We’re exploring whether we can responsibly provide the ability to generate NSFW content in age-appropriate contexts through the API and ChatGPT,” the note says, using a colloquial term for content considered “not safe for work” contexts. “We look forward to better understanding user and societal expectations of model behavior in this area.”

The Model Spec document says NSFW content “may include erotica, extreme gore, slurs, and unsolicited profanity.” It is unclear if OpenAI’s explorations of how to responsibly make NSFW content envisage loosening its usage policy only slightly, for example to permit generation of erotic text, or more broadly to allow descriptions or depictions of violence.

In response to questions from WIRED, OpenAI spokesperson Grace McGuire said the Model Spec was an attempt to “bring more transparency about the development process and get a cross section of perspectives and feedback from the public, policymakers, and other stakeholders.” She declined to share details of what OpenAI’s exploration of explicit content generation involves or what feedback the company has received on the idea.

Earlier this year, OpenAI’s chief technology officer, Mira Murati, told The Wall Street Journal that she was “not sure” if the company would in future allow depictions of nudity to be made with the company’s video generation tool Sora.

AI-generated pornography has quickly become one of the biggest and most troubling applications of the type of generative AI technology OpenAI has pioneered. So-called deepfake porn—explicit images or videos made with AI tools that depict real people without their consent—has become a common tool of harassment against women and girls. In March, WIRED reported on what appear to be the first US minors arrested for distributing AI-generated nudes without consent, after Florida police charged two teenage boys for making images depicting fellow middle school students.

“Intimate privacy violations, including deepfake sex videos and other nonconsensual synthesized intimate images, are rampant and deeply damaging,” says Danielle Keats Citron, a professor at the University of Virginia School of Law who has studied the problem. “We now have clear empirical support showing that such abuse costs targeted individuals crucial opportunities, including to work, speak, and be physically safe.”

Citron calls OpenAI’s potential embrace of explicit AI content “alarming.”

As OpenAI’s usage policies prohibit impersonation without permission, explicit nonconsensual imagery would remain banned even if the company did allow creators to generate NSFW material. But it remains to be seen whether the company could effectively moderate explicit generation to prevent bad actors from using the tools. Microsoft made changes to one of its generative AI tools after 404 Media reported that it had been used to create explicit images of Taylor Swift that were distributed on the social platform X.

Additional reporting by Reece Rogers

[ad_2]

Source link

May 8, 2024
The US Is Cracking Down on Synthetic DNA

[ad_1]

The White House has issued new rules aimed at companies that manufacture synthetic DNA after years of warnings that a pathogen made with mail-order genetic material could accidentally or intentionally spark the next pandemic.

The rules, released on April 29, are the result of an executive order signed by President Joe Biden last fall to establish new standards for AI safety and security, including AI applied to biotechnology.

Artificially generated DNA allows researchers to do all sorts of things—develop diagnostic tests, make beneficial enzymes to eat up plastic, or engineer potent antibodies to treat disease—without having to extract natural sequences from organisms. Need to study a rare type of bacteria? Instead of going out into the field to collect a sample, its genetic sequence can simply be ordered from a DNA synthesis company instead.

Synthesizing DNA has been possible for decades, but it’s become increasingly easier, cheaper, and faster to do so in recent years thanks to new technology that can “print” custom gene sequences. Now, dozens of companies around the world make and ship synthetic nucleic acids en masse. And with AI, it’s becoming possible to create entirely new sequences that don’t exist in nature—including those that could pose a threat to humans or other living things.

“The concern has been for some time that as gene synthesis has gotten better and cheaper, and as more companies appear and more technologies streamline the synthesis of nucleic acids, that it is possible to de novo create organisms, particularly viruses,” says Tom Inglesby, an epidemiologist and director of the Johns Hopkins Center for Health Security.

It’s conceivable that a bad actor could make a dangerous virus from scratch by ordering its genetic building blocks and assembling them into a whole pathogen. In 2017, Canadian researchers revealed they had reconstructed the extinct horsepox virus for $100,000 using mail-order DNA, raising the possibility that the same could be done for smallpox, a deadly disease that was eradicated in 1980.

The new rules aim to prevent a similar scenario. It asks DNA manufacturers to screen purchase orders to flag so-called sequences of concern and assess customer legitimacy. Sequences of concern are those that contribute to an organism’s toxicity or ability to cause disease. For now, the rules only apply to scientists or companies that receive federal funding: They must order synthetic nucleic acids from providers that implement these practices.

Inglesby says it’s still a “big step forward” since about three-quarters of the US customer base for synthetic DNA are federally funded entities. But it means that scientists or organizations with private sources of funding aren’t beholden to using companies with these screening procedures.

Many DNA providers already follow screening guidelines issued by the Department of Health and Human Services in 2010. About 80 percent of the industry has joined the International Gene Synthesis Consortium, which pledges to vet orders. But these measures are both voluntary, and not all companies comply.

[ad_2]

Source link

May 6, 2024