Tag: quanta magazine

NASA’s Quest to Touch the Sun

[ad_1]

The original version of this story appeared in Quanta Magazine.

Our sun is the best-observed star in the entire universe.

We see its light every day. For centuries, scientists have tracked the dark spots dappling its radiant face, while in recent decades, telescopes in space and on Earth have scrutinized sunbeams in wavelengths spanning the electromagnetic spectrum. Experiments have also sniffed the sun’s atmosphere, captured puffs of the solar wind, collected solar neutrinos and high-energy particles, and mapped our star’s magnetic field—or tried to, since we have yet to really observe the polar regions that are key to learning about the sun’s inner magnetic structure.

For all that scrutiny, however, one crucial question remained embarrassingly unsolved. At its surface, the sun is a toasty 6,000 degrees Celsius. But the outer layers of its atmosphere, called the corona, can be a blistering—and perplexing—1 million degrees hotter.

You can see that searing sheath of gas during a total solar eclipse, as happened on April 8 above a swath of North America. If you were in the path of totality, you could see the corona as a glowing halo around the moon-shadowed sun.

This year, that halo looked different than the one that appeared during the last North American eclipse, in 2017. Not only is the sun more active now, but you were looking at a structure that we—the scientists who study our home star—have finally come to understand. Observing the sun from afar wasn’t good enough for us to grasp what heats the corona. To solve this and other mysteries, we needed a sun-grazing space probe.

That spacecraft—NASA’s Parker Solar Probe—launched in 2018. As it loops around the sun, dipping in and out of the solar corona, it has collected data that shows us how small-scale magnetic activity within the solar atmosphere makes the solar corona almost inconceivably hot.

From Surface to Sheath

To begin to understand that roasting corona, we need to consider magnetic fields.

The sun’s magnetic engine, called the solar dynamo, lies about 200,000 kilometers beneath the sun’s surface. As it churns, that engine drives solar activity, which waxes and wanes over periods of roughly 11 years. When the sun is more active, solar flares, sunspots, and outbursts increase in intensity and frequency (as is happening now, near solar maximum).

At the sun’s surface, magnetic fields accumulate at the boundaries of churning convective cells, known as supergranules, which look like bubbles in a pan of boiling oil on the stove. The constantly boiling solar surface concentrates and strengthens those magnetic fields at the cells’ edges. Those amplified fields then launch transient jets and nanoflares as they interact with solar plasma.

Courtesy of NSO/NSF/AURA/Quanta Magazine

CAPTION: These churning convective cells on the sun’s surface, each approximately the size of the state of Texas, are closely connected to the magnetic activity that heats the sun’s corona.
CREDIT: NSO/NSF/AURA

Magnetic fields can also erupt through the sun’s surface and produce larger-scale phenomena. In regions where the field is strong, you see dark sunspots and giant magnetic loops. In most places, especially in the lower solar corona and near sunspots, these magnetic arcs are “closed,” with both ends attached to the sun. These closed loops come in various sizes—from minuscule ones to the dramatic, blazing arcs seen during eclipses.

[ad_2]

Source link

May 12, 2024
An Old Abstract Field of Math Is Unlocking the Deep Complexity of Spacecraft Orbits

[ad_1]

The original version of this story appeared in Quanta Magazine.

In October, a Falcon Heavy rocket is scheduled to launch from Cape Canaveral in Florida, carrying NASA’s Europa Clipper mission. The $5 billion mission is designed to find out if Europa, Jupiter’s fourth-largest moon, can support life. But because Europa is constantly bombarded by intense radiation created by Jupiter’s magnetic field, the Clipper spacecraft can’t orbit the moon itself. Instead, it will slide into an eccentric orbit around Jupiter and gather data by repeatedly swinging by Europa—53 times in total—before retreating from the worst of the radiation. Every time the spacecraft rounds Jupiter, its path will be slightly different, ensuring that it can take pictures and gather data from Europa’s poles to its equator.

To plan convoluted tours like this one, trajectory planners use computer models that meticulously calculate the trajectory one step at a time. The planning takes hundreds of mission requirements into account, and it’s bolstered by decades of mathematical research into orbits and how to join them into complicated tours. Mathematicians are now developing tools which they hope can be used to create a more systematic understanding of how orbits relate to one another.

“What we have is the previous computations that we’ve done, that guide us as we do the current computations. But it’s not a complete picture of all the options that we have,” said Daniel Scheeres, an aerospace engineer at the University of Colorado, Boulder.

“I think that was my biggest frustration when I was a student,” said Dayung Koh, an engineer at NASA’s Jet Propulsion Laboratory. “I know these orbits are there, but I don’t know why.” Given the expense and complexity of missions to the moons of Jupiter and Saturn, not knowing why orbits are where they are is a problem. What if there is a completely different orbit that could get the job done with fewer resources? As Koh said: “Did I find them all? Are there more? I can’t tell that.”

After getting her doctorate from the University of Southern California in 2016, Koh grew interested in how orbits can be cataloged into families. Jovian orbits that are far from Europa form such a family; so do orbits close to Europa. But other families are less obvious. For instance, for any two bodies, like Jupiter and Europa, there is an intermediate point where the two bodies’ gravitational effects balance to create stable points. Spacecraft can orbit this point, even though there is nothing at the center of the orbit. These orbits form a family called Lyapunov orbits. Add a little energy to such an orbit by firing a spacecraft engine, and at first you’ll stay in the same family. But add enough, and you’ll cross over into another family—say, one that includes Jupiter inside its orbits. Some orbit families might require less fuel than others, remain in sunlight at all times, or have other useful features.

Dayung Koh, an engineer at NASA’s Jet Propulsion Laboratory, is trying to come to a systematic understanding of how orbits in a planetary system relate to one another.

PHOTO: Courtesy of Dayung Koh

[ad_2]

Source link

May 5, 2024
The Mysterious ‘Dark’ Energy That Permeates the Universe Is Slowly Eroding

[ad_1]

Beyond DESI, a slew of new instruments are coming online in the coming years, including the 8.4-meter Vera Rubin Observatory in Chile, NASA’s Nancy Grace Roman Space Telescope, and the European Space Agency’s Euclid mission.

“Our data in cosmology has made enormous leaps over the last 25 years, and it’s about to make bigger leaps,” Frieman said.

As they amass new observations, researchers may continue to find that dark energy appears as constant as it has for a generation. Or, if the trend continues in the direction suggested by DESI’s results, it could change everything.

New Physics

If dark energy is weakening, it can’t be a cosmological constant. Instead, it may be the same sort of field that many cosmologists think sparked a moment of exponential expansion during the universe’s birth. This kind of “scalar field” could fill space with an amount of energy that looks constant at first—like the cosmological constant—but eventually starts to slip over time.

“The idea that dark energy is varying is very natural,” said Paul Steinhardt, a cosmologist at Princeton University. Otherwise, he continued, “it would be the only form of energy we know which is absolutely constant in space and time.”

But that variability would bring about a profound paradigm shift: We would not be living in a vacuum, which is defined as the lowest-energy state of the universe. Instead, we would inhabit an energized state that’s slowly sliding toward a true vacuum. “We’re used to thinking that we’re living in the vacuum,” Steinhardt said, “but no one promised you that.”

The fate of the cosmos would depend on how quickly the number previously known as the cosmological constant declines, and how far it might go. If it reaches zero, cosmic acceleration would stop. If it dips far enough below zero, the expansion of space would turn to a slow contraction—the sort of reversal required for cyclic theories of cosmology, such as those developed by Steinhardt.

String theorists share a similar outlook. With their proposal that everything boils down to the vibration of strings, they can weave together universes with different numbers of dimensions and all manner of exotic particles and forces. But they can’t easily construct a universe that permanently maintains a stable positive energy, as our universe has seemed to. Instead, in string theory, the energy must either gently fall over the course of billions of years or violently drop to zero or a negative value. “Essentially, all string theorists believe that it’s one or the other. We do not know which one,” said Cumrun Vafa of Harvard University.

Observational evidence for a gradual decline of dark energy would be a boon for the gentle-fall scenario. “That would be amazing. It would be the most important discovery since the discovery of dark energy itself,” Vafa said.

But for now, any such speculations are rooted in the DESI analysis in only the loosest of ways. Cosmologists will have to observe many millions more galaxies before seriously entertaining thoughts of revolution.

“If this holds up, it could light the way to a new, potentially deeper understanding of the universe,” Riess said. “The next few years should be very revealing.”

Original story reprinted with permission from Quanta Magazine, an editorially independent publication of the Simons Foundation whose mission is to enhance public understanding of science by covering research developments and trends in mathematics and the physical and life sciences.

[ad_2]

Source link

April 28, 2024
Here’s a Clever Way to Uncover America’s Voting Deserts

[ad_1]

The original version of this story appeared in Quanta Magazine.

In Georgia’s 2020 gubernatorial election, some voters in Atlanta waited over 10 hours to cast a ballot. One reason for the long lines was that almost 10 percent of Georgia’s polling sites had closed over the preceding seven years, despite an influx of about 2 million voters. These closures were disproportionately concentrated in predominantly Black areas that tended to vote Democratic.

But pinpointing the locations of “voting deserts” isn’t as straightforward as it might seem. Sometimes a lack of capacity is reflected in long waits at the polls, but other times the problem is the distance to the nearest polling place. Combining these factors in a systematic way is tricky.

In a paper due to be published this summer in the journal SIAM Review, Mason Porter, a mathematician at the University of California, Los Angeles, and his students used tools from topology to do just that. Abigail Hickok, one of the paper’s coauthors, conceived the idea after seeing images of long lines in Atlanta. “Voting was on my mind a lot, partly because it was an especially anxiety-inducing election,” she said.

Topologists study the underlying properties and spatial relations of geometric shapes under transformation. Two shapes are considered topologically equivalent if one can deform into the other via continuous movements without tearing, gluing, or introducing new holes.

At first glance, topology would seem to be a poor fit for the problem of polling site placement. Topology concerns itself with continuous shapes, and polling sites are at discrete locations. But in recent years, topologists have adapted their tools to work on discrete data by creating graphs of points connected by lines and then analyzing the properties of those graphs. Hickok said these techniques are useful not only for understanding the distribution of polling places but also for studying who has better access to hospitals, grocery stores, and parks.

That’s where the topology begins.

Imagine creating tiny circles around each point on the graph. The circles start with a radius of zero, but they grow with time. Specifically, when the time exceeds the wait time at a given polling place, the circle will begin to expand. As a consequence, locations with shorter wait times will have bigger circles—they start growing first—and locations with longer wait times will have smaller ones.

Some circles will eventually touch each other. When this happens, draw a line between the points at their centers. If multiple circles overlap, connect all those points into “simplices,” which is just a general term meaning shapes such as triangles (a 2-simplex) and tetrahedrons (3-simplex).

Courtesy of Merrill Sherman/Quanta Magazine

[ad_2]

Source link

April 21, 2024
The Quest to Map the Inside of the Proton

[ad_1]

“How are matter and energy distributed?” asked Peter Schweitzer, a theoretical physicist at the University of Connecticut. “We don’t know.”

Schweitzer has spent most of his career thinking about the gravitational side of the proton. Specifically, he’s interested in a matrix of properties of the proton called the energy-momentum tensor. “The energy-momentum tensor knows everything there is to be known about the particle,” he said.

In Albert Einstein’s theory of general relativity, which casts gravitational attraction as objects following curves in space-time, the energy-momentum tensor tells space-time how to bend. It describes, for instance, the arrangement of energy (or, equivalently, mass)—the source of the lion’s share of space-time twisting. It also tracks information about how momentum is distributed, as well as where there will be compression or expansion, which can also lightly curve space-time.

If we could learn the shape of space-time surrounding a proton, Russian and American physicists independently worked out in the 1960s, we could infer all the properties indexed in its energy-momentum tensor. Those include the proton’s mass and spin, which are already known, along with the arrangement of the proton’s pressures and forces, a collective property physicists refer to as the “Druck term,” after the word for pressure in German. This term is “as important as mass and spin, and nobody knows what it is,” Schweitzer said—though that’s starting to change.

In the ’60s, it seemed as if measuring the energy-momentum tensor and calculating the Druck term would require a gravitational version of the usual scattering experiment: You fire a massive particle at a proton and let the two exchange a graviton—the hypothetical particle that makes up gravitational waves—rather than a photon. But due to the extreme weakness of gravity, physicists expect graviton scattering to occur 39 orders of magnitude more rarely than photon scattering. Experiments can’t possibly detect such a weak effect.

“I remember reading about this when I was a student,” said Volker Burkert, a member of the Jefferson Lab team. The takeaway was that “we probably will never be able to learn anything about mechanical properties of particles.”

Gravity Without Gravity

Gravitational experiments are still unimaginable today. But research in the late 1990s and early 2000s by the physicists Xiangdong Ji and, working separately, the late Maxim Polyakov revealed a workaround.

The general scheme is the following. When you fire an electron lightly at a proton, it usually delivers a photon to one of the quarks and glances off. But in fewer than one in a billion events, something special happens. The incoming electron sends in a photon. A quark absorbs it and then emits another photon a heartbeat later. The key difference is that this rare event involves two photons instead of one—both incoming and outgoing photons. Ji’s and Polyakov’s calculations showed that if experimentalists could collect the resulting electron, proton and photon, they could infer from the energies and momentums of these particles what happened with the two photons. And that two-photon experiment would be essentially as informative as the impossible graviton-scattering experiment.

[ad_2]

Source link

April 14, 2024
A Popular Alien-Hunting Technique Is Increasingly in Doubt

[ad_1]

The third factor is the probability of a lifeless planet producing the observed signal—an equally serious challenge, researchers now realize, that’s tangled up in the problem of unconceived abiotic alternatives.

“That’s the probability that we argue you can’t fill in responsibly,” Vickers said. “It could almost range from anything from zero to 1.”

Consider the case of K2-18 b, a “mini-Neptune” that’s intermediate in size between Earth and Neptune. In 2023, JWST data revealed a statistically weak sign of dimethyl sulfide (DMS) in its atmosphere. On Earth, DMS is produced by marine organisms. The researchers who tentatively detected it on K2-18 b interpreted the other gases discovered in its sky to mean that the planet is a “water world” with a habitable surface ocean, supporting their theory that the DMS there comes from marine life. But other scientists interpret the same observations as evidence of an inhospitable, gaseous planetary composition more like Neptune’s.

Unconceived alternatives have already forced astrobiologists multiple times to revise their ideas about what makes a good biosignature. When phosphine was detected on Venus, scientists didn’t know of any ways it could be produced on a lifeless rocky world. Since then, they’ve identified several feasible abiotic sources of the gas. One scenario is that volcanoes release chemical compounds called phosphides, which could react with sulfur dioxide in Venus’ atmosphere to form phosphine—a plausible explanation given that scientists have found evidence of active volcanism on our twin planet. Likewise, oxygen was considered a biosignature gas until the 2010s, when researchers including Victoria Meadows at the NASA Astrobiology Institute’s Virtual Planetary Laboratory began to find ways that rocky planets could accumulate oxygen without a biosphere. For example, oxygen can form from sulfur dioxide, which abounds on worlds as diverse as Venus and Europa.

Today, astrobiologists have largely abandoned the idea that a single gas could be a biosignature. Instead, they focus on identifying “ensembles,” or sets of gases that couldn’t coexist without life. If anything can be called today’s gold-standard biosignature, it’s the combination of oxygen and methane. Methane rapidly degrades in oxygen-rich atmospheres. On Earth, the two gases only coexist because the biosphere continuously replenishes them.

So far, scientists haven’t managed to come up with an abiotic explanation for oxygen-methane biosignatures. But Vickers, Smith and Mathis doubt that this particular pair—or perhaps any mix of gases—will ever be convincing. “There’s no way to be certain that what we’re looking at is actually a consequence of life, as opposed to a consequence of some unknown geochemical process,” Smith said.

“JWST is not a life detector. It’s a telescope that can tell us what gases are in the atmosphere of a planet,” Mathis said.

Sarah Rugheimer, an astrobiologist at York University who studies exoplanet atmospheres, is more sanguine. She’s actively looking into alternate abiotic explanations for ensemble biosignatures like oxygen and methane. Still, she says, “I would be popping open a bottle of champagne—very expensive champagne—if we saw oxygen, methane, and water, and CO₂” on an exoplanet.

[ad_2]

Source link

April 7, 2024
The Brain Region That Controls Movement Also Guides Feelings

[ad_1]

The original version of this story appeared in Quanta Magazine.

In recent decades, neuroscience has seen some stunning advances, and yet a critical part of the brain remains a mystery. I am referring to the cerebellum, so named for the Latin for “little brain,” which is situated like a bun at the back of the brain. This is no small oversight: The cerebellum contains three-quarters of all the brain’s neurons, which are organized in an almost crystalline arrangement, in contrast to the tangled thicket of neurons found elsewhere.

Encyclopedia articles and textbooks underscore the fact that the cerebellum’s function is to control body movement. There is no question that the cerebellum has this function. But scientists now suspect that this long-standing view is myopic.

Or so I learned in November in Washington, DC, while attending the Society for Neuroscience annual meeting, the largest meeting of neuroscientists in the world. There, a pair of neuroscientists organized a symposium on newly discovered functions of the cerebellum unrelated to motor control. New experimental techniques are showing that in addition to controlling movement, the cerebellum regulates complex behaviors, social interactions, aggression, working memory, learning, emotion, and more.

A Crack in Dominant Wisdom

The connection between the cerebellum and movement has been known since the 19th century. Patients suffering trauma to the brain region had obvious difficulties with balance and movement, leaving no doubt that it was critical for coordinating motion. Over the decades, neuroscientists developed a detailed understanding of how the cerebellum’s unique neural circuitry controls motor function. The explanation of how the cerebellum worked seemed watertight.

Then, in 1998, in the journal Brain, neurologists reported on wide-ranging emotional and cognitive disabilities in patients with damage to the cerebellum. For example, in 1991, a 22-year-old female college student had fallen while ice skating; a CT scan revealed a tumor in her cerebellum. After it was removed surgically, she was a completely different person. The bright college student had lost her ability to write with proficiency, do mental arithmetic, name common objects, or copy a simple diagram. Her mood flattened. She hid under covers and behaved inappropriately, undressing in the corridors and speaking in baby talk. Her social interactions, including recognizing familiar faces, were also impaired.

This and similar cases puzzled the authors. These high-level cognitive and emotional functions were understood to reside in the cerebral cortex and limbic system. “Precisely what that cerebellar role is, and how the cerebellum accomplishes it, is yet to be established,” they concluded.

Despite these clues from clinical studies that conventional wisdom was on the wrong track, leading authorities still insisted that the function of the cerebellum was to control movement and nothing more. “It is kind of sad, because it has been 20 years” since these cases were reported, said Diasynou Fioravante, a neurophysiologist at the UC Davis, who co-organized the conference symposium.

Other neurologists have noticed neuropsychiatric deficits in their patients all along, said the neuroscientist Stephanie Rudolph of Albert Einstein College of Medicine, who co-organized the symposium with Fioravante. However, there was no hard anatomical evidence for how the cerebellum’s unique neural circuitry could possibly regulate the reported psychological and emotional functions, so the clinical reports were overlooked.

[ad_2]

Source link

March 31, 2024
Large Language Models’ Emergent Abilities Are a Mirage

[ad_1]

The original version of this story appeared in Quanta Magazine.

Two years ago, in a project called the Beyond the Imitation Game benchmark, or BIG-bench, 450 researchers compiled a list of 204 tasks designed to test the capabilities of large language models, which power chatbots like ChatGPT. On most tasks, performance improved predictably and smoothly as the models scaled up—the larger the model, the better it got. But with other tasks, the jump in ability wasn’t smooth. The performance remained near zero for a while, then performance jumped. Other studies found similar leaps in ability.

The authors described this as “breakthrough” behavior; other researchers have likened it to a phase transition in physics, like when liquid water freezes into ice. In a paper published in August 2022, researchers noted that these behaviors are not only surprising but unpredictable, and that they should inform the evolving conversations around AI safety, potential, and risk. They called the abilities “emergent,” a word that describes collective behaviors that only appear once a system reaches a high level of complexity.

But things may not be so simple. A new paper by a trio of researchers at Stanford University posits that the sudden appearance of these abilities is just a consequence of the way researchers measure the LLM’s performance. The abilities, they argue, are neither unpredictable nor sudden. “The transition is much more predictable than people give it credit for,” said Sanmi Koyejo, a computer scientist at Stanford and the paper’s senior author. “Strong claims of emergence have as much to do with the way we choose to measure as they do with what the models are doing.”

We’re only now seeing and studying this behavior because of how large these models have become. Large language models train by analyzing enormous data sets of text—words from online sources including books, web searches, and Wikipedia—and finding links between words that often appear together. The size is measured in terms of parameters, roughly analogous to all the ways that words can be connected. The more parameters, the more connections an LLM can find. GPT-2 had 1.5 billion parameters, while GPT-3.5, the LLM that powers ChatGPT, uses 350 billion. GPT-4, which debuted in March 2023 and now underlies Microsoft Copilot, reportedly uses 1.75 trillion.

That rapid growth has brought an astonishing surge in performance and efficacy, and no one is disputing that large enough LLMs can complete tasks that smaller models can’t, including ones for which they weren’t trained. The trio at Stanford who cast emergence as a “mirage” recognize that LLMs become more effective as they scale up; in fact, the added complexity of larger models should make it possible to get better at more difficult and diverse problems. But they argue that whether this improvement looks smooth and predictable or jagged and sharp results from the choice of metric—or even a paucity of test examples—rather than the model’s inner workings.

[ad_2]

Source link

March 24, 2024
Never-Repeating Patterns of Tiles Can Safeguard Quantum Information

[ad_1]

This extreme fragility might make quantum computing sound hopeless. But in 1995, the applied mathematician Peter Shor discovered a clever way to store quantum information. His encoding had two key properties. First, it could tolerate errors that only affected individual qubits. Second, it came with a procedure for correcting errors as they occurred, preventing them from piling up and derailing a computation. Shor’s discovery was the first example of a quantum error-correcting code, and its two key properties are the defining features of all such codes.

The first property stems from a simple principle: Secret information is less vulnerable when it’s divided up. Spy networks employ a similar strategy. Each spy knows very little about the network as a whole, so the organization remains safe even if any individual is captured. But quantum error-correcting codes take this logic to the extreme. In a quantum spy network, no single spy would know anything at all, yet together they’d know a lot.

Each quantum error-correcting code is a specific recipe for distributing quantum information across many qubits in a collective superposition state. This procedure effectively transforms a cluster of physical qubits into a single virtual qubit. Repeat the process many times with a large array of qubits, and you’ll get many virtual qubits that you can use to perform computations.

The physical qubits that make up each virtual qubit are like those oblivious quantum spies. Measure any one of them and you’ll learn nothing about the state of the virtual qubit it’s a part of—a property called local indistinguishability. Since each physical qubit encodes no information, errors in single qubits won’t ruin a computation. The information that matters is somehow everywhere, yet nowhere in particular.

“You can’t pin it down to any individual qubit,” Cubitt said.

All quantum error-correcting codes can absorb at least one error without any effect on the encoded information, but they will all eventually succumb as errors accumulate. That’s where the second property of quantum error-correcting codes kicks in—the actual error correction. This is closely related to local indistinguishability: Because errors in individual qubits don’t destroy any information, it’s always possible to reverse any error using established procedures specific to each code.

Taken for a Ride

Zhi Li, a postdoc at the Perimeter Institute for Theoretical Physics in Waterloo, Canada, was well versed in the theory of quantum error correction. But the subject was far from his mind when he struck up a conversation with his colleague Latham Boyle. It was the fall of 2022, and the two physicists were on an evening shuttle from Waterloo to Toronto. Boyle, an expert in aperiodic tilings who lived in Toronto at the time and is now at the University of Edinburgh, was a familiar face on those shuttle rides, which often got stuck in heavy traffic.

“Normally they could be very miserable,” Boyle said. “This was like the greatest one of all time.”

Before that fateful evening, Li and Boyle knew of each other’s work, but their research areas didn’t directly overlap, and they’d never had a one-on-one conversation. But like countless researchers in unrelated fields, Li was curious about aperiodic tilings. “It’s very hard to be not interested,” he said.

[ad_2]

Source link

March 17, 2024
Selective Forgetting Can Help AI Learn Better

[ad_1]

The original version of this story appeared in Quanta Magazine.

A team of computer scientists has created a nimbler, more flexible type of machine learning model. The trick: It must periodically forget what it knows. And while this new approach won’t displace the huge models that undergird the biggest apps, it could reveal more about how these programs understand language.

The new research marks “a significant advance in the field,” said Jea Kwon, an AI engineer at the Institute for Basic Science in South Korea.

The AI language engines in use today are mostly powered by artificial neural networks. Each “neuron” in the network is a mathematical function that receives signals from other such neurons, runs some calculations, and sends signals on through multiple layers of neurons. Initially the flow of information is more or less random, but through training, the information flow between neurons improves as the network adapts to the training data. If an AI researcher wants to create a bilingual model, for example, she would train the model with a big pile of text from both languages, which would adjust the connections between neurons in such a way as to relate the text in one language with equivalent words in the other.

But this training process takes a lot of computing power. If the model doesn’t work very well, or if the user’s needs change later on, it’s hard to adapt it. “Say you have a model that has 100 languages, but imagine that one language you want is not covered,” said Mikel Artetxe, a coauthor of the new research and founder of the AI startup Reka. “You could start over from scratch, but it’s not ideal.”

Artetxe and his colleagues have tried to circumvent these limitations. A few years ago, Artetxe and others trained a neural network in one language, then erased what it knew about the building blocks of words, called tokens. These are stored in the first layer of the neural network, called the embedding layer. They left all the other layers of the model alone. After erasing the tokens of the first language, they retrained the model on the second language, which filled the embedding layer with new tokens from that language.

Even though the model contained mismatched information, the retraining worked: The model could learn and process the new language. The researchers surmised that while the embedding layer stored information specific to the words used in the language, the deeper levels of the network stored more abstract information about the concepts behind human languages, which then helped the model learn the second language.

“We live in the same world. We conceptualize the same things with different words” in different languages, said Yihong Chen, the lead author of the recent paper. “That’s why you have this same high-level reasoning in the model. An apple is something sweet and juicy, instead of just a word.”

[ad_2]

Source link

March 10, 2024