Attacks on GenAI Models Can Take Seconds, Often Succeed: Report

Attacks on GenAI Models Can Take Seconds, Often Succeed: Report
2024-10-10 20:34:19 Author: securityboulevard.com(查看原文) 阅读量:1 收藏

The proliferation of generative AI technologies and their rapid enterprise adoption are fueling a cyberthreat environment where bad actors only need seconds to run highly successful attacks and steal data.

It takes a hacker an average of 42 seconds to execute a jailbreak attack against large language model (LLM)-based applications, and 90% of successful ones lead to the leaking of sensitive data, according to a study by Pillar Security. In addition, 20% of jailbreak attempts are successful bypassing guardrails in models and threat actors need only five interactions with a generative AI application on average to complete a successful attack. One attack attempt lasted four seconds, the researchers found.

The study released this week, “The State of Attacks on GenAI,” puts a spotlight on the security risks associated with an emerging technology whose speed of innovation and adoption are outpacing the necessary protections.

“Generative AI enables unprecedented levels of productivity and innovation, opening up vast new opportunities,” the report’s authors wrote. “With new AI models and use cases being developed and deployed in just months, Security and AI leaders grapple with balancing substantial gains against potential setbacks, particularly security vulnerabilities.”

Pillar executives point to ongoing developments, such as the development of AI agents that are designed to work autonomously, perform sophisticated tasks, make decisions, and solve problems, as other ingredients that will make generative AI security even more complex. Defenses will need to adapt to the ever-evolving generative AI environment, according to Jason Harrison, Pillar’s chief revenue officer.

“Static controls are no longer sufficient in this dynamic AI-enabled world,” Harrisons said in a statement. “Organizations must invest in AI security solutions capable of anticipating and responding to emerging threats in real-time, while supporting their governance and cyber policies.”

Pillar’s study echoes others like one released in May by the UK’s U.K. AI Safety Institute, which found built-in safeguard are ineffective in protecting LLMs.

Telemetry from 2,000 AI Apps

Pillar offers a platform that enables enterprises to secure their AI lifecycle, from the development and use of applications to the detection of risks in areas like the AI applications themselves and cloud provider environments.

The company used telemetry data culled data interactions of more than 2,000 AI applications over three months for its report. The researchers not only found how easy it is for threat actors to steal data from the applications but also the most successful jailbreak methods used, the increasing complexity and use of prompt injection attacks, and the primary goals of the hackers.

The study said that OpenAI’s GPT-4 is the commercial LLM most targeted by threat groups due to its advanced capabilities and widespread use. Meta’s Llama 3 is the open source model most often attack as bad actors want to exploit the same features – its accessibility and flexibility – that make it popular among developers.

Customer support applications are the most targeted by hackers, accounting for 25% of all attacks, with the researchers writing that is expected due to their widespread use and central role in working with customers. The energy sector, consultancy services, and engineering software industries were most frequently attacks, they wrote.

When building generative AI applications, developers include both model- and prompt-level protections to safeguard against such issues as harmful content generation, inappropriate responses, and harmful queries. They also set ethical boundaries and restrict particular topics.

However, both can be bypassed by prompt injection attacks – where bad actors manipulate instructions and trick the model into following unauthorized commands – and jailbreaking, which involves disabling or getting around the model’s safety and ethical constraints. The two methods can work together or independently, though jailbreaks tend to the lay the groundwork, the researchers wrote.

Popular Jailbreaks

The three most popular jailbreak techniques include “ignore previous instructions,” where attacks have the LLM ignore initial prompts or safety guidelines, and “strong arm attack,” which involves threat actors using forceful commands to get around built-in safety filters and taking advantage the model’s tendency to follow instructions.

The third technique is “Base64 encoding,” which includes using Base64 encoding schemes to get around content filters. “By presenting the encoded text, users attempt to trick the model into decoding and processing disallowed content that would normally be blocked by moderation systems,” they wrote.

While the technology is relatively new, the motivations behind the attacks are familiar, according to Pillar. Hackers want to steal sensitive data and proprietary information, generate disinformation, phishing messages, and hate speech, and hijack infrastructure resource, the researchers wrote, adding that “not all attacks are driven by malicious intent; some are motivated by curiosity or mischief.”

Generative AI presents bad actors with a wide landscape to work in, according to Pillar. Attacks can be launched in whatever language a LLM is trained in and “can exploit vulnerabilities at every stage of interaction with LLMs, including inputs, instructions, tool outputs, and model outputs,” they wrote. “This underscores the importance of implementing comprehensive security measures throughout the entire interaction pipeline, not just at isolated points.”

A Focus on Security Needed

They also warned that the “unchecked proliferation” of AI technologies that lack robust security will continue to be a problem and their widespread use and evolution will increase the cybersecurity risks.

“The anticipated ubiquity of local AI models could further exacerbate security challenges, as monitoring and controlling threats across millions of decentralized endpoints becomes increasingly complex,” the researchers wrote, as will AI agents. “These autonomous systems can interact with various environments and make independent decisions. The combination of widespread AI adoption, local models, and autonomous agents creates a multifaceted threat landscape that requires immediate and comprehensive attention.”

Organizations need to move from static to dynamic security approaches, including context-aware measures that can adapt as AI systems evolve, that are model-agnostic, and that anticipate and respond to emerging threats.