Tag: neural networks

Transformers: Why the T in ChatGPT is AI’s biggest breakthrough – and greatest risk

[ad_1]

A good neural network architecture is vital when developing artificial intelligence

SHUTTERSTOCK/Qpt

When ChatGPT first took the world by storm in 2022, its capabilities were so impressive that people happily looked past its awkward name. Yet hidden within those initials lies a key breakthrough responsible for sending artificial intelligence rocketing these past few years – and potentially a limitation that could see it crashing back to Earth.

GPT stands for generative pre-trained transformer, and it is the last word that matters most. The term was coined in a 2017 paper by a team at Google, which introduced a concept called…

[ad_2]

Source link

August 15, 2024
AI Is a Black Box. Anthropic Figured Out a Way to Look Inside

[ad_1]

Last year, the team began experimenting with a tiny model that uses only a single layer of neurons. (Sophisticated LLMs have dozens of layers.) The hope was that in the simplest possible setting they could discover patterns that designate features. They ran countless experiments with no success. “We tried a whole bunch of stuff, and nothing was working. It looked like a bunch of random garbage,” says Tom Henighan, a member of Anthropic’s technical staff. Then a run dubbed “Johnny”—each experiment was assigned a random name—began associating neural patterns with concepts that appeared in its outputs.

“Chris looked at it, and he was like, ‘Holy crap. This looks great,’” says Henighan, who was stunned as well. “I looked at it, and was like, ‘Oh, wow, wait, is this working?’”

Suddenly the researchers could identify the features a group of neurons were encoding. They could peer into the black box. Henighan says he identified the first five features he looked at. One group of neurons signified Russian texts. Another was associated with mathematical functions in the Python computer language. And so on.

Once they showed they could identify features in the tiny model, the researchers set about the hairier task of decoding a full-size LLM in the wild. They used Claude Sonnet, the medium-strength version of Anthropic’s three current models. That worked, too. One feature that stuck out to them was associated with the Golden Gate Bridge. They mapped out the set of neurons that, when fired together, indicated that Claude was “thinking” about the massive structure that links San Francisco to Marin County. What’s more, when similar sets of neurons fired, they evoked subjects that were Golden Gate Bridge-adjacent: Alcatraz, California Governor Gavin Newsom, and the Hitchcock movie Vertigo, which was set in San Francisco. All told the team identified millions of features—a sort of Rosetta Stone to decode Claude’s neural net. Many of the features were safety-related, including “getting close to someone for some ulterior motive,” “discussion of biological warfare,” and “villainous plots to take over the world.”

The Anthropic team then took the next step, to see if they could use that information to change Claude’s behavior. They began manipulating the neural net to augment or diminish certain concepts—a kind of AI brain surgery, with the potential to make LLMs safer and augment their power in selected areas. “Let’s say we have this board of features. We turn on the model, one of them lights up, and we see, ‘Oh, it’s thinking about the Golden Gate Bridge,’” says Shan Carter, an Anthropic scientist on the team. “So now, we’re thinking, what if we put a little dial on all these? And what if we turn that dial?”

So far, the answer to that question seems to be that it’s very important to turn the dial the right amount. By suppressing those features, Anthropic says, the model can produce safer computer programs and reduce bias. For instance, the team found several features that represented dangerous practices, like unsafe computer code, scam emails, and instructions for making dangerous products.

[ad_2]

Source link

May 21, 2024
Inside the Creation of DBRX, the World’s Most Powerful Open Source AI Model

[ad_1]

This past Monday, about a dozen engineers and executives at data science and AI company Databricks gathered in conference rooms connected via Zoom to learn if they had succeeded in building a top artificial intelligence language model. The team had spent months, and about $10 million, training DBRX, a large language model similar in design to the one behind OpenAI’s ChatGPT. But they wouldn’t know how powerful their creation was until results came back from the final tests of its abilities.

“We’ve surpassed everything,” Jonathan Frankle, chief neural network architect at Databricks and leader of the team that built DBRX, eventually told the team, which responded with whoops, cheers, and applause emojis. Frankle usually steers clear of caffeine but was taking sips of iced latte after pulling an all-nighter to write up the results.

Databricks will release DBRX under an open source license, allowing others to build on top of its work. Frankle shared data showing that across about a dozen or so benchmarks measuring the AI model’s ability to answer general knowledge questions, perform reading comprehension, solve vexing logical puzzles, and generate high-quality code, DBRX was better than every other open source model available.

AI decision makers: Jonathan Frankle, Naveen Rao, Ali Ghodsi, and Hanlin Tang.Photograph: Gabriela Hasbun

It outshined Meta’s Llama 2 and Mistral’s Mixtral, two of the most popular open source AI models available today. “Yes!” shouted Ali Ghodsi, CEO of Databricks, when the scores appeared. “Wait, did we beat Elon’s thing?” Frankle replied that they had indeed surpassed the Grok AI model recently open-sourced by Musk’s xAI, adding, “I will consider it a success if we get a mean tweet from him.”

To the team’s surprise, on several scores DBRX was also shockingly close to GPT-4, OpenAI’s closed model that powers ChatGPT and is widely considered the pinnacle of machine intelligence. “We’ve set a new state of the art for open source LLMs,” Frankle said with a super-sized grin.

Building Blocks

By open-sourcing, DBRX Databricks is adding further momentum to a movement that is challenging the secretive approach of the most prominent companies in the current generative AI boom. OpenAI and Google keep the code for their GPT-4 and Gemini large language models closely held, but some rivals, notably Meta, have released their models for others to use, arguing that it will spur innovation by putting the technology in the hands of more researchers, entrepreneurs, startups, and established businesses.

Databricks says it also wants to open up about the work involved in creating its open source model, something that Meta has not done for some key details about the creation of its Llama 2 model. The company will release a blog post detailing the work involved to create the model, and also invited WIRED to spend time with Databricks engineers as they made key decisions during the final stages of the multimillion-dollar process of training DBRX. That provided a glimpse of how complex and challenging it is to build a leading AI model—but also how recent innovations in the field promise to bring down costs. That, combined with the availability of open source models like DBRX, suggests that AI development isn’t about to slow down any time soon.

Ali Farhadi, CEO of the Allen Institute for AI, says greater transparency around the building and training of AI models is badly needed. The field has become increasingly secretive in recent years as companies have sought an edge over competitors. Opacity is especially important when there is concern about the risks that advanced AI models could pose, he says. “I’m very happy to see any effort in openness,” Farhadi says. “I do believe a significant portion of the market will move towards open models. We need more of this.”

[ad_2]

Source link

March 27, 2024
AI could help replicate smells in danger of being lost to history

[ad_1]

Some scents are at risk of vanishing forever. Can AI replicate them?

blickwinkel/Alamy

Artificial intelligence can whip up the formula to recreate a perfume based on its chemical composition. One day, it could use a lone sample to reproduce rare smells at risk of being lost, such as incense from a culturally specific ritual or the smell of a forest that is changing because of rising temperatures.

Idelfonso Nogueira at the Norwegian University of Science and Technology and his colleagues profiled two existing fragrances, categorising them by scent family – subjective words such as “spicy” or “musk” commonly used to describe perfume – and so-called “odour value”, a measure of how intense a certain smell is. For instance, one of the fragrances scored the highest odour value for “coumarinic”, a family of scents similar to vanilla. The other received the highest odour value for the scent family “alcoholic”.

To train a neural network, the researchers used a database of known molecules associated with specific fragrance notes. The AI learned to generate an array of molecules that matched the odour scores for each scent family of the sample fragrances.

But merely generating those molecules was not enough to reproduce the target fragrances, says Nogueira, because the way we perceive smell is affected by the physical and chemical processes molecules go through when they interact with air or skin. Immediately after being sprayed, a perfume’s “top notes” are most noticeable, but they vanish within minutes as molecules evaporate, leaving “base notes” that can linger for days. To address this, the team chose molecules generated by the AI that evaporated under similar conditions as those in the original fragrances.

Finally, they again used AI to minimise any mismatches between the odour values of the original mixture and the AI-generated mixture. Their ultimate recipe for one of the fragrances showed small deviations with respect to its “coumarinic” and “sharp” notes, while the other seemed to be a very precise replica.

Predicting what a chemical will smell like is notoriously difficult, so the researchers used a limited number of molecules in their training data. But the process could be even more precise if the database is expanded to contain more – and more complex – molecules, says Nogueira. He suggests AI could help the perfume industry create recipes that produce a cheaper, more sustainable version of a fragrance. Currently, experts estimate developing a new perfume with traditional techniques can take up to three years and cost as much as $50,000 per kilogram.

Richard Gerkin at Arizona State University and Osmo, a start-up aiming to teach computers how to generate smells like AI can do with images, says combining AI with physics and chemistry is a strength of this approach because it accounts for often overlooked subtleties such as how smells evaporate. But the effectiveness of this process still has to be confirmed in studies with people, he says.

Nogueria and his colleagues have already nearly gotten there. In a few weeks, he will be off to a colleague’s lab in Ljubljana, Slovenia, to experience some of the AI-generated fragrances himself. “I am very excited to smell them,” he says.

Topics:

[ad_2]

Source link

March 1, 2024