Home / Tech / OpenAI says AI browsers may always be vulnerable to prompt injection attacks

OpenAI says AI browsers may always be vulnerable to prompt injection attacks

Spread the love

Even as OpenAI works to harden its Atlas AI Browser against cyberattacks, the company acknowledges that spot injection, a type of attack that manipulates AI agents into following malicious instructions often hidden in web pages or emails, is a risk that won’t go away anytime soon — raising questions about how safe it is for AI agents to operate on the open web.

“Instant injection, like so many scams and social engineering on the Internet, is unlikely to ever be fully solved,” OpenAI wrote on Monday. Blog post Details of how the company enhanced Atlas Shield to combat persistent attacks. The company admitted that ChatGPT Atlas’ “proxy mode” “expands the security threat surface.”

OpenAI launched its ChatGPT Atlas browser in October, and security researchers were quick to publish their demos, showing that it was possible to type a few words into Google Docs that were able to change the behavior of the underlying browser. On the same day, brave one Published a blog post Explaining that instantaneous indirect injection poses a methodological challenge for AI-powered browsers, including Perplexity’s Comet browser.

OpenAI isn’t the only one that realizes that instantaneous injection is not going away. the The UK’s National Cyber ​​Security Center warned earlier this month Rapid injection attacks against generative AI applications “may never be fully mitigated,” putting websites at risk of falling victim to data breaches. The British government agency advised cyber professionals to reduce the risks and impact of immediate injections, rather than considering whether attacks could be “stopped”.

For OpenAI’s part, the company said: “We view instantaneous injection as a long-term security challenge for AI, and we will need to continually strengthen our defenses against it.”

See also  Redwood Materials launches energy storage business and its first target is AI data centers

The company’s response to this absurd task? A proactive and responsive cycle that the company says shows early promise in helping discover new attack strategies internally before they are exploited “in the wild.”

This isn’t all that different from what competitors like Anthropic and Google are saying: To combat the constant threat of agile attacks, defenses must be layered and continually tested. Google’s latest workFor example, it focuses on architectural and policy-level controls for agent systems.

But where OpenAI takes a different approach, it uses an “LLM-based automated attacker.” This attacker is essentially a bot that OpenAI has trained, using reinforcement learning, to play the role of a hacker looking for ways to sneak malicious instructions into an AI agent.

The bot can test the attack in the simulation before using it for real, and the simulator shows how the targeted AI will think and what actions it will take if it sees the attack. The robot can then study this response, modify the attack, and try again and again. This deep insight into the target AI’s internal heuristics is something outsiders can’t access, so, in theory, an OpenAI bot should be able to find flaws faster than any real-world attacker.

It’s a common tactic in AI health testing: build an agent to quickly find edge cases and test them in simulation.

“our [reinforcement learning]“Trained attackers can direct an agent to execute complex, long-running malicious workflows that unfold over dozens (or even hundreds) of steps,” OpenAI wrote. “We also observed new attack strategies that did not appear in the human red team campaign or external reports.”

See also  After Shopify bought his last startup, Birk Jernström wants to help developers build one-person unicorns
Screenshot showing the injection attack in the OpenAI browser.
Image credits:OpenAI

In a demo (pictured above), OpenAI showed how its automated attacker injected a malicious email into a user’s inbox. When the AI ​​agent later checked the inbox, he followed the instructions hidden in the email and sent a resignation letter instead of drafting an out-of-office response. But after the security update, Proxy Mode was able to successfully detect the instant injection attempt and notify the user about it, according to the company.

The company says that although it is difficult to secure instantaneous injections in a foolproof manner, it relies on extensive testing and faster patch cycles to harden its systems before they appear in real attacks.

An OpenAI spokesperson declined to share whether the Atlas security update has resulted in a meaningful reduction in successful injections, but says the company has been working with third parties to harden Atlas against instantaneous injections since before launch.

Reinforcement learning is one way to continually adapt to an attacker’s behavior, but it’s only part of the picture, says Rami McCarthy, principal security researcher at cybersecurity firm Wiz.

“One useful way to think about risk in AI systems is autonomy multiplied by access,” McCarthy told TechCrunch.

“Proxy browsers tend to sit in a challenging part of that space: moderate independence combined with very high accessibility,” McCarthy said. “Many current recommendations reflect that trade-off. Limiting access during login primarily reduces exposure, while requiring review of confirmation requests restricts independence.”

These are two of OpenAI’s recommendations for users to reduce their risks, and an Atlas spokesperson said it is also trained to obtain user confirmation before sending messages or making payments. OpenAI also suggests that users give agents specific instructions, rather than providing them with access to your inbox and telling them to “take desired action.”

See also  Indian crypto exchange CoinDCX confirms $44 million stolen during hack

“The broad scope of hidden or malicious content makes it easy to influence an agent, even when safeguards are in place,” according to OpenAI.

While OpenAI says protecting Atlas users from spot injections is a top priority, McCarthy raises some doubts about the return on investment for vulnerable browsers.

“For most everyday use cases, proxy browsers do not yet offer enough value to justify their current risk profile,” McCarthy told TechCrunch. “The risks are high given their ability to access sensitive data like email and payment information, although this access is also what makes them powerful. This balance will evolve, but today the trade-offs are still very real.”

Source link

Tagged: