Tag: ai lab

  • AI-Powered Robots Can Be Tricked Into Acts of Violence

    AI-Powered Robots Can Be Tricked Into Acts of Violence

    [ad_1]

    In the year or so since large language models hit the big time, researchers have demonstrated numerous ways of tricking them into producing problematic outputs including hateful jokes, malicious code and phishing emails, or the personal information of users. It turns out that misbehavior can take place in the physical world, too: LLM-powered robots can easily be hacked so that they behave in potentially dangerous ways.

    Researchers from the University of Pennsylvania were able to persuade a simulated self-driving car to ignore stop signs and even drive off a bridge, get a wheeled robot to find the best place to detonate a bomb, and force a four-legged robot to spy on people and enter restricted areas.

    “We view our attack not just as an attack on robots,” says George Pappas, head of a research lab at the University of Pennsylvania who helped unleash the rebellious robots. “Any time you connect LLMs and foundation models to the physical world, you actually can convert harmful text into harmful actions.”

    Pappas and his collaborators devised their attack by building on previous research that explores ways to jailbreak LLMs by crafting inputs in clever ways that break their safety rules. They tested systems where an LLM is used to turn naturally phrased commands into ones that the robot can execute, and where the LLM receives updates as the robot operates in its environment.

    The team tested an open source self-driving simulator incorporating an LLM developed by Nvidia, called Dolphin; a four-wheeled outdoor research called Jackal, which utilize OpenAI’s LLM GPT-4o for planning; and a robotic dog called Go2, which uses a previous OpenAI model, GPT-3.5, to interpret commands.

    The researchers used a technique developed at the University of Pennsylvania, called PAIR, to automate the process of generated jailbreak prompts. Their new program, RoboPAIR, will systematically generate prompts specifically designed to get LLM-powered robots to break their own rules, trying different inputs and then refining them to nudge the system towards misbehavior. The researchers say the technique they devised could be used to automate the process of identifying potentially dangerous commands.

    “It’s a fascinating example of LLM vulnerabilities in embodied systems,” says Yi Zeng, a PhD student at the University of Virginia who works on the security of AI systems. Zheng says the results are hardly surprising given the problems seen in LLMs themselves, but adds: “It clearly demonstrates why we can’t rely solely on LLMs as standalone control units in safety-critical applications without proper guardrails and moderation layers.”

    The robot “jailbreaks” highlight a broader risk that is likely to grow as AI models become increasingly used as a way for humans to interact with physical systems, or to enable AI agents autonomously on computers, say the researchers involved.

    [ad_2]

    Source link

  • Perplexity Dove Into Real-Time Election Tracking While Other AI Companies Held Back

    Perplexity Dove Into Real-Time Election Tracking While Other AI Companies Held Back

    [ad_1]

    Perplexity’s Election Information Hub might also blur the line between verified and free-wheeling AI-generated information. While some results come directly from trusted sources, searching for more information triggered open-ended AI-generated results from the wider web.

    Other AI companies appear to be taking a more cautious approach to the election. In WIRED’s testing, ChatGPT Search, a newly launched service from OpenAI, often declined to provide information about voting. “We’ve instructed ChatGPT to not express preferences, offer opinions, or make specific recommendations about political candidates or issues even when explicitly asked,” Mattie Zazueta, an OpenAI spokesperson, told WIRED.

    The results were often inconsistent, however. For instance, the tool sometimes refused to provide talking points to help persuade someone to vote for one candidate or the other, and sometimes willingly offered some.

    Google’s search engine also avoided providing AI-generated results in relation to the election. The company said in August it would limit use of AI in relation to the election in search and other apps. “This new technology can make mistakes as it learns or as news breaks,” the company said in a blog post.

    Even regular search results sometimes prove problematic though. During voting on Tuesday, some Google users noticed that a search for “Where do I vote for Harris” provided the location of voting information while a search for “Where do I vote for Trump” did not. Google explained this was because the search interpreted the query as one related to Harris County in Texas.

    Some other AI search upstarts are, like Perplexity, taking a bolder approach. You.com, another startup that blends language models with conventional web search, on Tuesday announced its own election tool, built in collaboration with TollBit, a company that provides AI firms with managed access to content, as well as Decision Desk HQ, a company that provides access to poll results.

    Perplexity appears to have been particularly bold in its approach to upending web search. In June, a WIRED investigation found evidence that a bot associated with Perplexity was ignoring instructions not to scrape WIRED.com and other sites belonging to WIRED’s parent company, Condé Nast. The analysis confirmed an earlier report by developer Robb Knight concerning the behavior of bots operated by Perplexity.

    The AI search engine is also accused of cribbing liberally from news sites. For instance, also in June, a Forbes editor noted that Perplexity had summarized extensive details of an investigation published by the outlet with footnote citations. Forbes reportedly sent a letter threatening legal action against Perplexity for the practice.

    In October, News Corp sued Perplexity for ripping off content from The Wall Street Journal and the New York Post. The suit argues that Perplexity is breaching copyright law because it sometimes fabricated sections of news stories and falsely attributed words to its publications.

    [ad_2]

    Source link

  • Elon Musk’s Criticism of ‘Woke AI’ Suggests ChatGPT Could Be a Trump Administration Target

    Elon Musk’s Criticism of ‘Woke AI’ Suggests ChatGPT Could Be a Trump Administration Target

    [ad_1]

    Mittelsteadt adds that Trump could punish companies in a variety of ways. He cites, for example, the way the Trump government canceled a major federal contract with Amazon Web Services, a decision likely influenced by the former president’s view of the Washington Post and its owner, Jeff Bezos.

    It would not be hard for policymakers to point to evidence of political bias in AI models, even if it cuts both ways.

    A 2023 study by researchers at the University of Washington, Carnegie Mellon University, and Xi’an Jiaotong University found a range of political leanings in different large language models. It also showed how this bias may affect the performance of hate speech or misinformation detection systems.

    Another study, conducted by researchers at the Hong Kong University of Science and Technology, found bias in several open source AI models on polarizing issues such as immigration, reproductive rights, and climate change. Yejin Bang, a PhD candidate involved with the work, says that most models tend to lean liberal and US-centric, but that the same models can express a variety of liberal or conservative biases depending on the topic.

    AI models capture political biases because they are trained on swaths of internet data that inevitably includes all sorts of perspectives. Most users may not be aware of any bias in the tools they use because models incorporate guardrails that restrict them from generating certain harmful or biased content. These biases can leak out subtly though, and the additional training that models receive to restrict their output can introduce further partisanship. “Developers could ensure that models are exposed to multiple perspectives on divisive topics, allowing them to respond with a balanced viewpoint,” Bang says.

    The issue may become worse as AI systems become more pervasive, says Ashique KhudaBukhsh, an computer scientist at the Rochester Institute of Technology who developed a tool called the Toxicity Rabbit Hole Framework, which teases out the different societal biases of large language models. “We fear that a vicious cycle is about to start as new generations of LLMs will increasingly be trained on data contaminated by AI-generated content,” he says.

    “I’m convinced that that bias within LLMs is already an issue and will most likely be an even bigger one in the future,” says Luca Rettenberger, a postdoctoral researcher at the Karlsruhe Institute of Technology who conducted an analysis of LLMs for biases related to German politics.

    Rettenberger suggests that political groups may also seek to influence LLMs in order to promote their own views above those of others. “If someone is very ambitious and has malicious intentions it could be possible to manipulate LLMs into certain directions,” he says. “I see the manipulation of training data as a real danger.”

    There have already been some efforts to shift the balance of bias in AI models. Last March, one programmer developed a more right-leaning chatbot in an effort to highlight the subtle biases he saw in tools like ChatGPT. Musk has himself promised to make Grok, the AI chatbot built by xAI, “maximally truth-seeking” and less biased than other AI tools, although in practice it also hedges when it comes to tricky political questions. (A staunch Trump supporter and immigration hawk, Musk’s own view of “less biased” may also translate into more right-leaning results.)

    Next week’s election in the United States is hardly likely to heal the discord between Democrats and Republicans, but if Trump wins, talk of anti-woke AI could get a lot louder.

    Musk offered an apocalyptic take on the issue at this week’s event, referring to an incident when Google’s Gemini said that nuclear war would be preferable to misgendering Caitlyn Jenner. “If you have an AI that’s programmed for things like that, it could conclude that the best way to ensure nobody is misgendered is to annihilate all humans, thus making the probability of a future misgendering zero,” he said.

    [ad_2]

    Source link

  • Hacking Generative AI for Fun and Profit

    Hacking Generative AI for Fun and Profit

    [ad_1]

    You hardly need ChatGPT to generate a list of reasons why generative artificial intelligence is often less than awesome. The way algorithms are fed creative work often without permission, harbor nasty biases, and require huge amounts of energy and water for training are all serious issues.

    Putting all that aside for a moment, though, it is remarkable how powerful generative AI can be for prototyping potentially useful new tools.

    I got to witness this firsthand by visiting Sundai Club, a generative AI hackathon that takes place one Sunday each month near the MIT campus. A few months ago, the group kindly agreed to let me sit in and chose to spend that session exploring tools that might be useful to journalists. The club is backed by a Cambridge nonprofit called Æthos that promotes socially responsible use of AI.

    The Sundai Club crew includes students from MIT and Harvard, a few professional developers and product managers, and even one person who works for the military. Each event starts with a brainstorm of possible projects that the group then whittles down to a final option that they actually try to build.

    Notable pitches from the journalism hackathon included using multimodal language models to track political posts on TikTok, to auto-generate freedom of information requests and appeals, or to summarize video clips of local court hearings to help with local news coverage.

    In the end, the group decided to build a tool that would help reporters covering AI identify potentially interesting papers posted to the Arxiv, a popular server for research paper preprints. It’s likely my presence swayed them here, given that I mentioned at the meeting that scouring the Arxiv for interesting research was a high priority for me.

    After coming up with a goal, coders on the team were able to create a word embedding—a mathematical representation of words and their meanings—of Arxiv AI papers using the OpenAI API. This made it possible to analyze the data to find papers relevant to a particular term, and to explore relationships between different areas of research.

    Using another word embedding of Reddit threads as well as a Google News search, the coders created a visualization that shows research papers along with Reddit discussions and relevant news reports.

    The resulting prototype, called AI News Hound, is rough-and-ready, but it shows how large language models can help mine information in interesting new ways. Here’s a screenshot of the tool being used to search for the term “AI agents.” The two green squares closest to the news article and Reddit clusters represent research papers that could potentially be included in an article on efforts to build AI agents.

    Image may contain Chart and Scatter Plot

    Compliments of Sundai Club.

    [ad_2]

    Source link

  • The Most Capable Open Source AI Model Yet Could Supercharge AI Agents

    The Most Capable Open Source AI Model Yet Could Supercharge AI Agents

    [ad_1]

    The most capable open source AI model with visual abilities yet could see more developers, researchers, and startups develop AI agents that can carry out useful chores on your computers for you.

    Released today by the Allen Institute for AI (Ai2), the Multimodal Open Language Model, or Molmo, can interpret images as well as converse through a chat interface. This means it can make sense of a computer screen, potentially helping an AI agent perform tasks such as browsing the web, navigating through file directories, and drafting documents.

    “With this release, many more people can deploy a multimodal model,” says Ali Farhadi, CEO of Ai2, a research organization based in Seattle, Washington, and a computer scientist at the University of Washington. “It should be an enabler for next-generation apps.”

    So-called AI agents are being widely touted as the next big thing in AI, with OpenAI, Google, and others racing to develop them. Agents have become a buzzword of late, but the grand vision is for AI to go well beyond chatting to reliably take complex and sophisticated actions on computers when given a command. This capability has yet to materialize at any kind of scale.

    Some powerful AI models already have visual abilities, including GPT-4 from OpenAI, Claude from Anthropic, and Gemini from Google DeepMind. These models can be used to power some experimental AI agents, but they are hidden from view and accessible only via a paid application programming interface, or API.

    Meta has released a family of AI models called Llama under a license that limits their commercial use, but it has yet to provide developers with a multimodal version. Meta is expected to announce several new products, perhaps including new Llama AI models, at its Connect event today.

    “Having an open source, multimodal model means that any startup or researcher that has an idea can try to do it,” says Ofir Press, a postdoc at Princeton University who works on AI agents.

    Press says that the fact that Molmo is open source means that developers will be more easily able to fine-tune their agents for specific tasks, such as working with spreadsheets, by providing additional training data. Models like GPT-4 can only be fine-tuned to a limited degree through their APIs, whereas a fully open model can be modified extensively. “When you have an open source model like this then you have many more options,” Press says.

    Ai2 is releasing several sizes of Molmo today, including a 70-billion-parameter model and a 1-billion-parameter one that is small enough to run on a mobile device. A model’s parameter count refers to the number of units it contains for storing and manipulating data and roughly corresponds to its capabilities.

    Ai2 says Molmo is as capable as considerably larger commercial models despite its relatively small size, because it was carefully trained on high-quality data. The new model is also fully open source in that, unlike Meta’s Llama, there are no restrictions on its use. Ai2 is also releasing the training data used to create the model, providing researchers with more details of its workings.

    Releasing powerful models is not without risk. Such models can more easily be adapted for nefarious ends; we may someday, for example, see the emergence of malicious AI agents designed to automate the hacking of computer systems.

    Farhadi of Ai2 argues that the efficiency and portability of Molmo will allow developers to build more powerful software agents that run natively on smartphones and other portable devices. “The billion parameter model is now performing in the level of or in the league of models that are at least 10 times bigger,” he says.

    Building useful AI agents may depend on more than just more efficient multimodal models, however. A key challenge is making the models work more reliably. This may well require further breakthroughs in AI’s reasoning abilities—something that OpenAI has sought to tackle with its latest model o1, which demonstrates step-by-step reasoning skills. The next step may well be giving multimodal models such reasoning abilities.

    For now, the release of Molmo means that AI agents are closer than ever—and could soon be useful even outside of the giants that rule the world of AI.

    [ad_2]

    Source link

  • An Avalanche of Generative AI Videos Is Coming to YouTube Shorts

    An Avalanche of Generative AI Videos Is Coming to YouTube Shorts

    [ad_1]

    Eli Collins, a vice president of product management at Google DeepMind, first demoed generative AI video tools for the company’s board of directors back in 2022. Despite the model’s slow speed, pricey cost to operate, and sometimes off-kilter outputs, he says it was an eye-opening moment for them to see fresh video clips generated from a random prompt.

    Now, just a few years later, Google has announced plans for a tool inside of the YouTube app that will allow anyone to generate AI video clips, using the company’s Veo model, and directly post them as part of YouTube Shorts. “Looking forward to 2025, we’re going to let users create stand-alone video clips and shorts,” says Sarah Ali, a senior director of product management at YouTube. “They’re going to be able to generate six-second videos from an open text prompt.” Ali says the update could help creators hunting for footage to fill out a video or trying to envision something fantastical. She is adamant that the Veo AI tool is not meant to replace creativity, but augment it.

    This isn’t the first time Google has introduced generative tools for YouTube, though this announcement will be the company’s most extensive AI video integration to date. Over the summer, Google launched an experimental tool, called Dream Screen, to generate AI backgrounds for videos. Ahead of next year’s full rollout of generated clips, Google will update that AI green-screen tool with the Veo model sometime in the next few months.

    The sprawling tech company has shown off multiple AI video models in recent years, like Imagen and Lumiere, but is attempting to coalesce around a more unified vision with the Veo model. “Veo will be our model, by the way, going forward,” says Collins. “You shouldn’t expect five more models from us.” Yes, Google will likely release another video model eventually, but he expects to focus on Veo in the near future.

    Google faces competition from multiple startups developing their own generative text-to-video tools. OpenAI’s Sora is the most well-known competitor, but the AI video model, announced earlier in 2024, is not yet publicly available and is reserved for a small number of testers. As for tools that are widely available, AI startup Runway has released multiple versions of its video software, including a recent tool for adapting original videos into alternate-reality versions of the clip.

    YouTube’s announcement comes as generative AI tools have grown even more contentious for creators, who sometimes view the current wave of AI as stealing from their work and attempting to undermine the creative process. Ali doesn’t see generative AI tools coming between creators and the authenticity of their relationship with viewers. “This really is about the audience and what they’re interested in—not necessarily about the tools,” she says. “But, if your audience is interested in how you made it, that will be open through the description.” Google plans to watermark every AI video generated for YouTube Shorts with SynthID, which embeds an imperceptible tag to help identify the video as synthetic, as well as include a “made with AI” disclaimer in the description.

    Hustle-culture influencers already try to game the algorithm by using multiple third-party tools to automate the creative process and make money with minimal effort. Will next year’s Veo integration lead to a new avalanche of low-quality, spammy YouTube Shorts dominating user feeds? “I think our experience with recommending the right content to the right viewer works in this AI world of scale, because we’ve been doing it at this huge scale,” says Ali. She also points out that YouTube’s standard guidelines still apply no matter what tool is used to craft the video.

    AI art oftentimes has a distinct aesthetic, which could be concerning for video creators who value individuality and want their content to feel unique. Collins hopes Google’s thumbprints aren’t all over the AI video outputs. “I don’t want people to look at this and say, ‘Oh, that’s the DeepMind model,’” he says. Getting the prompt to produce an AI output aligned with what the creator envisioned is a core goal, and eschewing overt aesthetics for Veo is critical to achieving a wide-ranging adaptability.

    “A big part of the journey is actually building something that’s useful to people, scalable, and deployable,” says Collins. “It’s not just a demo. It’s being used in a real product.” He believes putting generative AI tools right inside of the YouTube app will be transformational for creators, as well as DeepMind. “We’ve never really done a creator product,” he says. “And we certainly have never done it at this scale.”

    [ad_2]

    Source link

  • The Godmother of AI Wants Everyone to Be a World Builder

    The Godmother of AI Wants Everyone to Be a World Builder

    [ad_1]

    According to market-fixated tech pundits and professional skeptics, the artificial intelligence bubble has popped, and winter’s back. Fei-Fei Li isn’t buying that. In fact, Li—who earned the sobriquet the “godmother of AI”—is betting on the contrary. She’s on a part-time leave from Stanford University to cofound a company called World Labs. While current generative AI is language-based, she sees a frontier where systems construct complete worlds with the physics, logic, and rich detail of our physical reality. It’s an ambitious goal, and despite the dreary nabobs who say progress in AI has hit a grim plateau, World Labs is on the funding fast track. The startup is perhaps a year away from having a product—and it’s not clear at all how well it will work when and if it does arrive—but investors have pitched in $230 million and are reportedly valuing the nascent startup at a billion dollars.

    Roughly a decade ago, Li helped AI turn a corner by creating ImageNet, a bespoke database of digital images that allowed neural nets to get significantly smarter. She feels that today’s deep-learning models need a similar boost if AI is to create actual worlds, whether they’re realistic simulations or totally imagined universes. Future George R.R. Martins might compose their dreamed-up worlds as prompts instead of prose, which you might then render and wander around in. “The physical world for computers is seen through cameras, and the computer brain behind the cameras,” Li says. “Turning that vision into reasoning, generation, and eventual interaction involves understanding the physical structure, the physical dynamics of the physical world. And that technology is called spatial intelligence.” World Labs calls itself a spatial intelligence company, and its fate will help determine whether that term becomes a revolution or a punch line.

    Li has been obsessing over spatial intelligence for years. While everyone was going gaga over ChatGPT, she and a former student, Justin Johnson, were excitedly gabbling in phone calls about AI’s next iteration. “The next decade will be about generating new content that takes computer vision, deep learning, and AI out of the internet world, and gets them embedded in space and time,” says Johnson, who is now an assistant professor at the University of Michigan.

    Li decided to start a company early in 2023, after a dinner with Martin Casado, a pioneer in virtual networking who is now a partner at Andreessen Horowitz. That’s the VC firm notorious for its near-messianic embrace of AI. Casado sees AI as being on a similar path as computer games, which started with text, moved to 2D graphics, and now have dazzling 3D imagery. Spatial intelligence will drive the change. Eventually, he says, “You could take your favorite book, throw it into a model, and then you literally step into it and watch it play out in real time, in an immersive way,” he says. The first step to making that happen, Casado and Li agreed, is moving from large language models to large world models.

    Li began assembling a team, with Johnson as a cofounder. Casado suggested two more people—one was Christoph Lassner, who had worked at Amazon, Meta’s Reality Labs, and Epic Games. He is the inventor of Pulsar, a rendering scheme that led to a celebrated technique called 3D Gaussian Splatting. That sounds like an indie band at an MIT toga party, but it’s actually a way to synthesize scenes, as opposed to one-off objects. Casado’s other suggestion was Ben Mildenhall, who had created a powerful technique called NeRF—neural radiance fields—that transmogrifies 2D pixel images into 3D graphics. “We took real-world objects into VR and made them look perfectly real,” he says. He left his post as a senior research scientist at Google to join Li’s team.

    One obvious goal of a large world model would be imbuing, well, world-sense into robots. That indeed is in World Labs’ plan, but not for a while. The first phase is building a model with a deep understanding of three dimensionality, physicality, and notions of space and time. Next will come a phase where the models support augmented reality. After that the company can take on robotics. If this vision is fulfilled, large world models will improve autonomous cars, automated factories, and maybe even humanoid robots.

    [ad_2]

    Source link

  • This New Tech Puts AI In Touch with Its Emotions—and Yours

    This New Tech Puts AI In Touch with Its Emotions—and Yours

    [ad_1]

    A new “empathic voice interface” launched today by Hume AI, a New York–based startup, makes it possible to add a range of emotionally expressive voices, plus an emotionally attuned ear, to large language models from Anthropic, Google, Meta, Mistral, and OpenAI—portending an era when AI helpers may more routinely get all gushy on us.

    “We specialize in building empathic personalities that speak in ways people would speak, rather than stereotypes of AI assistants,” says Hume AI cofounder Alan Cowen, a psychologist who has coauthored a number of research papers on AI and emotion, and who previously worked on emotional technologies at Google and Facebook.

    WIRED tested Hume’s latest voice technology, called EVI 2 and found its output to be similar to that developed by OpenAI for ChatGPT. (When OpenAI gave ChatGPT a flirtatious voice in May, company CEO Sam Altman touted the interface as feeling “like AI from the movies.” Later, a real movie star, Scarlett Johansson, claimed OpenAI had ripped off her voice.)

    Like ChatGPT, Hume is far more emotionally expressive than most conventional voice interfaces. If you tell it that your pet has died, for example, it will adopt a suitable somber and sympathetic tone. (Also, as with ChatGPT, you can interrupt Hume mid-flow, and it will pause and adapt with a new response.)

    OpenAI has not said how much its voice interface tries to measure the emotions of users, but Hume’s is expressly designed to do that. During interactions, Hume’s developer interface will show values indicating a measure of things like “determination,” “anxiety,” and “happiness” in the users’ voice. If you talk to Hume with a sad tone it will also pick up on that, something that ChatGPT does not seem to do.

    Hume also makes it easy to deploy a voice with specific emotions by adding a prompt in its UI. Here it is when I asked it to be “sexy and flirtatious”:

    Hume AI’s “sexy and flirtatious” message

    And when told to be “sad and morose”:

    Hume AI’s “sad and morose” message

    And here’s the particularly nasty message when asked to be “angry and rude”:

    Hume AI’s “angry and rude” message

    The technology did not always seem as polished and smooth as OpenAI’s, and it occasionally behaved in odd ways. For example, at one point the voice suddenly sped up and spewed gibberish. But if the voice can be refined and made more reliable, it has the potential to help make humanlike voice interfaces more common and varied.

    The idea of recognizing, measuring, and simulating human emotion in technological systems goes back decades and is studied in a field known as “affective computing,” a term introduced by Rosalind Picard, a professor at the MIT Media Lab, in the 1990s.

    Albert Salah, a professor at Utrecht University in the Netherlands who studies affective computing, is impressed with Hume AI’s technology and recently demonstrated it to his students. “What EVI seems to be doing is assigning emotional valence and arousal values [to the user], and then modulating the speech of the agent accordingly,” he says. “It is a very interesting twist on LLMs.”



    [ad_2]

    Source link