Exposed Hugging Face APIs Opened AI Models to Cyberattacks
2023-12-5 02:41:45 Author: securityboulevard.com(查看原文) 阅读量:4 收藏

Security flaws found in both Hugging Face and GitHub repositories exposed almost 1,700 API tokens, opening up AI developers to supply chain and other attacks and putting a brighter spotlight on the need to ensure that security keeps up with the accelerating pace of innovation of AI and large-language models (LLMs).

In a report today, researchers with startup Lasso Security found more than 1,500 exposed APIs on the Hugging Face platform – essentially GitHub for the AI set – that allowed them to access the accounts of 723 organizations, including such companies as Microsoft, Google, Meta, and VMware.

Among those accounts, 655 users’ tokens had write permissions – 77 to different organizations – that granted the researchers full control of the repositories of prominent companies. Those included Meta, which runs the Llama project, EleutherAI and its Pythia project, and BigScience Workshop (Bloom project).

“Our investigation led to the revelation of a significant breach in the supply chain infrastructure, exposing high-profile accounts,” Bar Lanyado, security researcher at Lasso, wrote in the report. “The gravity of the situation cannot be overstated. With control over an organization boasting millions of downloads, we now possess the capability to manipulate existing models, potentially turning them into malicious entities. This implies a dire threat, as the injection of corrupted models could affect millions of users who rely on these foundational models for their applications.”

Along with the supply-chain threat the exposed APIs represented, they also opened up to the possibility of bad actors poisoning training data. Lasso researchers obtained access to 14 datasets with that can see hundreds of thousands of downloads a month.

“By tampering with these trusted datasets, attackers could compromise the integrity of machine learning models, leading to widespread consequences,” Lanyado wrote.

In addition, the researchers could have stolen more than 10,000 private AI models that were linked to more than 2,500 datasets.

Surprising Results

Lanyado told Security Boulevard that he expected the company’s research would return some vulnerabilities, but the results surprised him.

“I was extremely overwhelmed with the amount of tokens we were able to expose, and the type of tokens,” he said. We were able to access nearly all of the top technology companies’ tokens, and gain full control over some of them.”

Major companies like Meta, Microsoft, and Google take pride in their security capabilities but still were unaware of the significant third-party risk, he added.

“In large enterprises, like the one we accessed, R&D teams are forging ahead with implementations and applications daily,” Lanyado said. “In such an environment, companies need to constantly monitor the integration and utilization of generative AI and LLMs to understand not just where technology is headed but also how and where it is being applied for developmental purposes. This awareness ensures that these technological strides align with the company’s business [and] also security objectives.”

New to the Scene

Lasso is a Tel Aviv-based startup that only came out of stealth mode November 20 with $6 million in seed money led by Entrée Capital, with Samsung Next also contributing. The company focuses on cybersecurity for LLMs.

A key Hugging Face asset is its open-source Transformers library, he wrote, which holds more than 500,000 AI models and 250,000 datasets, including the Meta-Llama, Bloom, and Pythia models. The researchers ran into some roadblocks when they started searching for APIs in both Hugging Face and GitHub, but were able to dig deeper through increasingly detailed searches.

They then used the Hugging Face whoami API to ensure the validity of the token and such information as the token’s user and the user’s email, organization memberships, and permissions, and the token’s permissions and privileges, Lanyado wrote.

Lasso researchers also found one other problem. Hugging Face had announced the org_api tokens were deprecated and were blocked in its Python library. However, while the write functionality didn’t work, in some instances the read functionality did, and they were able to download private models with an exposed org_api token, such as with Microsoft.

The company contacted Hugging Face and all the organizations and users involved after running the research. Hugging Face fixed the vulnerability while many of the companies – including Meta, Google, Microsoft, and VMware – revoked the vulnerable tokens and removed the public access token code.

Security is a Shared Responsibility

Lanyado said that while generative AI and LLMs are increasingly critical to businesses’ operations, but organizations should use LLMs with a critical eye to ensure they’re secure.

“Due to LLMs’ advanced capabilities and complexity, they are vulnerable to multiple security concerns, as we showed in our recent research,” he said. “If an LLM is hacked, the attacker could gain access to the data source that it had been trained on, as well as the organization and user’s sensitive information. [It] poses a significant security risk.”

He also said developers should understand that Hugging Face and similar platforms aren’t secure enough, so responsibility for security will fall on developers and other users. They also shouldn’t work with hard-coded tokens to avoid having to constantly verify every commit that no tokens or sensitive information is pushed the repositories.

“We also recommend Hugging Face constantly scan for publicly exposed API tokens and revoke them or notify the users and organizations about the exposed tokens,” he said, noting that GitHub uses similar methods with OAuth, GitHub App, and personal access tokens.

Lanyado pointed to other reports about security problems with AI, including Nvidia discovering three flaws in LangChain chains and Rezilion finding dangerous workflow patterns in the LLM open-source ecosystem. In addition, Bishop Fox recently wrote about a critical vulnerability in Ray, an open-source compute framework for AI, while researchers with Google and a few universities outlined a relatively easy way to extract training data from ChatGPT.

All this reinforces the claim that “the field of artificial intelligence is advancing very rapidly yet it remains inadequately secured,” he said.

Recent Articles By Author


文章来源: https://securityboulevard.com/2023/12/exposed-hugging-face-apis-opened-ai-models-to-cyberattacks/
如有侵权请联系:admin#unsafe.sh