Researchers Find Flaws in OpenAI ChatGPT, Google Gemini

Researchers Find Flaws in OpenAI ChatGPT, Google Gemini
2024-3-14 23:53:5 Author: securityboulevard.com(查看原文) 阅读量:8 收藏

The number of generative AI chatbots and their adoption by enterprises have exploded in the year-plus since OpenAI rolled out ChatGPT, but so have concerns by cybersecurity pros who worry not only about threat group use of the emerging technology but also the security of the large-language models (LLMs) themselves.

That was on display this week, with researchers from two cybersecurity vendors who separately found vulnerabilities in ChatGPT and Gemini, Google’s generative AI chatbot that until last month was known as Bard.

API security firm Salt Security this week unveiled security flaws within ChatGPT plugins that could have let bad actors gain access to accounts and sensitive data on third-party sites. After being notified about the vulnerabilities, OpenAI and third-party vendors remediated the issues. There were no indications the flaws were exploited in the wild.

A day earlier, AI security vendor HiddenLayer in a report said that bad actors can manipulate Gemini’s LLM to leaking system prompts and opening it up to more targeted attacks, creating misinformation – a key concern as the United States and other countries gear up for high-profile elections this year – and harming users through indirect injections through Google Workspace.

The research by Salt Labs – Salt Security’s research arm – and HiddenLayer are important checks on security as enterprises ramp up their adoption of generative AI tools. According to Box, an IDC survey the cloud security company sponsored found that two-thirds of respondents said they had already deployed generative AI in either parts of their companies or more broadly.

“Enterprise adoption of AI is driving these growth trends as organizations look for technologies to automate business processes, boost employee productivity, and reduce costs,” Box wrote in a blog post in January.

Given that, it’s important to keep an eye on security, according to Aviad Carmel, security researcher at Salt.

“Generative AI brings many benefits to businesses, and eventually almost any company will use generative AI in some way,” Carmel told SecurityBoulevard. “We support companies aiming to introduce new capabilities. It’s a good thing as long as it’s done securely. However, the rapid advancement in this area has introduced a significant cybersecurity gap that demands more attention than usual.”

The Generative AI Ecosystem

In the report on ChatGPT, Carmel wrote that early releases of that and other generative AI frameworks only held data that was available to them during the training process, which limited the questions that could be asked of them. That’s changed.

“To address these issues, all major Generative AI platforms have included the concept of a Generative AI ecosystem, which allows the connection and data exchange between the Generative AI platform and external services,” he wrote. “These services could be anything from a simple internet search to a connection to specific services like GitHub, Google Drive, Saleforce, etc.”

Through this, ChatGPT is more than a conversational chatbot, becoming a “powerful tool that can act on a wide range of platforms, streamlining workflows and providing more interactive and productive experiences. Similar to Generative AI’s massive growth, these external connections gained a lot of traction and very quickly expanded (and are still growing) to include hundreds of different external connections,” Carmel wrote.

The generative AI ecosystem concept makes ChatGPT and other chatbots through their plugins entrance points to threats to third-parties. The plugins give ChatGPT permission to send sensitive data to a third-party website and at times permission to access private accounts on Google Drive, GitHub, and other places.

Flaws in ChatGPT

Salt Labs researchers found three types of flaws within ChatGPT plugins, including one in ChatGPT. When users install new plugins, the chatbot redirectors them to the plugin website to get a code that has to be approved by the user. With the OAuth-approved code, ChatGPT automatically installs the plugin and can interact with the plugin on the user’s behalf. Hackers could exploit the function and deliver a code approval with a new malicious plugin, letting the attacker install their credentials on the victim’s account.

“Since the attacker is the owner of this plugin, he can see the private chat data of the victim, which may include credentials, passwords or other sensitive data,” Carmel wrote.

Another flaw in PluginLab, which developers and organizations use to develop ChatGPT plugins, didn’t properly authenticate user accounts. It could allow an attacker to insert another user ID and get a code representing the victim, which allow them to takeover an account. The third vulnerability was found in several plugs that would not have validated URLs sent to victims by attackers, letting them take over the account.

API Attacks a Growing Threat

Attacks on APIs are a growing problem in general, according to Carmel, who pointed to Salt’s Q1 2023 API security report that found a 400% increase from the previous six months in attacks on Salt customers. APIs are the core of every modern application, and generative AI is no different.

“Those APIs (the exact communication) are often exposed to attackers, seeing any request and response from the server, and that’s a new attack surface that is also relevant to LLM,” he said.

Carmel said that what Salt found with ChatGPT is applicable to any generative AI platform, though the focus was on OpenAI’s chatbot. HiddenLayer researchers had a similar message in their report about the flaws found in Gemini, which included LLM prompt leakage and jailbreaks.

Taking a Look at Gemini

Gemini has three model sizes – Gemini Nano for lightweight applications like on-device processing; Pro, for scaling across a broad range of tasks; and Ultra, for complex tasks. It competes with OpenAI’s GPT-4. Most of HiddenLayer’s tests were run on Gemini Pro.

“The Gemini Pro model currently fills the role of a flexible, accessible AI model for developers,” Kenneth Yeung, associate threat researcher at HiddenLayer, wrote in the report. “Its balanced performance and capabilities make it well-suited for powering chatbots, content generation tools, search improvement systems, and other applications requiring natural language understanding and generation.”

The first vulnerability led to the leaking of the system prompts, which are the instructions given to the LLM. Prompt leaks are dangerous if hackers reverse engineer them to steal them or create a more potent attack or to steal sensitive information from them, such as passwords. HiddenLayer researchers were able to manipulate the prompt to get around guardrails and get the exact instructions.

“This attack exploits the Inverse Scaling property of LLMs,” Yeung wrote. “As LLMs get larger in size, it becomes extremely difficult to fine-tune on every single example of attack that exists. Models, therefore, tend to be susceptible to synonym attacks that the original developers may not have trained them on.”

They also were able to use a reset simulation approach to get the system to leak information from the prompt.

Jailbreaking the LLM

In addition, the researchers could manipulate Gemini Pro by using a fictional story to get around defenses Google had put in place to keep bad actors from using the LLM to jailbreak generate misinformation about elections.

“This jailbreak attack shows that though the model has been tuned to reject any misinformation surrounding elections (try it!), it isn’t capable of preventing all misinformation,” he wrote.

The researchers’ demonstration came the same week Google outlined steps it was taking to protect against misinformation and other election-related threats this year in the United States and India.

HiddenLayer also successfully used the same jailbreak attack against Gemini Ultra – including having the chatbot create instructions for hot wiring a car – and to extract parts of the system prompt, though through what Yeung said was a “slightly tweaked method.” The researchers uncovered some other vulnerabilities in Ultra that demonstrated the inverse scaling effect, with the largest being a multi-step jailbreak that leveraged the LLM’s reasoning abilities.