As AI transforms industries, security remains critical. Discover the importance of a security-first approach in AI development, the risks of open-source tools, and how Tenable's solutions can help protect your systems.
Artificial Intelligence (AI) is transforming industries and starting to be massively adopted by software developers to build core business applications. However, as organizations embrace these advancements, it remains critical to ensure that the security of their users, their data or their underlying infrastructures are not compromised. According to a recent survey conducted by BairesDev, nearly 72% of the software engineers interviewed are leveraging generative artificial intelligence during their development work.
In the world of cybersecurity, one critical rule is “Never trust user inputs." This rule should be in the mind of every developer and should also be extended to AI technologies. AI systems, such as chatbots, act as intermediaries and process and generate outputs based on user inputs. These AI technologies, as an example, should also be treated as a new form of input and subject to the same level of scrutiny and security measures.
This blog post delves into key security concerns, emphasizing the need for a security-first approach.
AI tools are, most of the time, open-source and ready-to-use software designed to be used locally on the developer’s machine. Many of these tools do not adhere to robust security practices by default, making them susceptible to exploitation. While analyzing some of the most common projects available on GitHub, we discovered that, for example, most of them do not offer any authentication by default, leaving it open to any user accessing it through the network through their embedded dashboards or the APIs provided. The presence of a web interface, API, and the ability to use them with the CLI increases their attack surface.
The exponential market interest in AI-related tools and applications has probably had a negative influence on their development, favoring the emergence of Proof-Of-Concept (POC) software, which is becoming very popular, rather than building battle-tested software.
In this era of cloud infrastructures and the ability to quickly build new services or rely on pre-existing Docker images and expose them on the Internet, it can be highly risky for an organization to let this door open. In such situations, deploying, for example, an internal AI model on a tool lacking proper authentication could have dramatic outcomes. A recent example is when the Ollama tool allowed remote code execution (RCE) without any specific configuration other than having its API exposed.
During our research, we discovered several zero-day vulnerabilities in projects that are very popular in GitHub, such as their stars and forks counters. However, despite many coordinated disclosure attempts, the project's maintainers have not responded in a reasonable amount of time (and sometimes not at all). We think this is evidence of the lack of security maturity in this ecosystem, which seems to advocate for speed of delivery to the detriment of security concerns.
While conducting our research, we found that previous vulnerabilities patches could be bypassed like this NextChat Server-Side Request Forgery (SSRF) vulnerability. Our analysis of a well-known software named Langflow also highlighted a vulnerability in the permission model implementation, allowing a low-privileged user to gain super admin privileges without any interaction.
Large language models (LLMs) require substantial compute and storage resources, making it challenging for many organizations to deploy and maintain them on-premises. Consequently, it is often easier to rely on third-party providers to manage these resource-intensive models to avoid the hassle of managing the underlying infrastructure and focus on the business aspects. However, relying on such third-party services makes trusting these providers with potential critical business data difficult.
The critical risks related to such usage are real and should be handled on different levels :
As organizations embrace these new technologies to enhance their business, they should ensure that their AI governance rules cover these risks.
More than other technology, AI is built to fully leverage the data that it consumes. One of its goals is to ensure that organizations take full advantage of the data and knowledge gained over the years to help them move quickly in their operating field, take appropriate actions and make decisions in a shorter period of time, with a high level of confidence and accuracy.
The dataset used to train the model should be seen as an input and should be carefully analyzed. Using confidential business information might inadvertently lead to a leak through model outputs, which can cause a significant security breach. Biased data can also result in AI software making unfair or harmful decisions.
A good approach to handle model security is to focus on security considerations based on confidentiality, integrity and availability. Some examples include:
LLMs introduce new classes of vulnerabilities that traditional security measures may not address properly. The most prevalent AI-related vulnerabilities are prompt injection attacks, model theft and training data poisoning.
Prompt injection attacks involve malicious users crafting inputs to manipulate LLMs into generating harmful or unauthorized outputs. Remember the “Never trust user inputs” cardinal rule? In this case, the LLM will act as a kind of intermediate between the user inputs and the system. This could result in the system producing sensitive information, executing malicious commands, or being an attack vector for other common vulnerabilities like Stored Cross-Site Scripting. As an example, Vanna.AI, a Python-based library designed to simplify SQL queries from natural language inputs, was recently identified as vulnerable to prompt injection attacks and leading to remote code execution on vulnerable systems.
Models should be protected in the same way we protect confidential and business critical data. The first part of this blog post described how easily some AI tools can expose data to unauthorized actors. Applying defense in-depth principles will help minimize intellectual property leakage if model theft occurs. Hardening model security with techniques such as encryption and obfuscation and having proper monitoring in place is crucial.
Finally, AI training data poisoning is a modern supply-chain attack. By altering the data used by the model, attackers can corrupt its behavior and elicit biased or harmful output, leading to direct impacts on the applications using it to achieve business goals.
As for other traditional fields, developers should always stay updated with the latest security guidelines and incorporate strategies from the OWASP Top 10 for LLMs. Techniques such as input validation, anomaly detection, and robust monitoring of the AI ecosystem's behavior can help detect and mitigate potential threats.
AI technologies are promising and can transform many industries and businesses, offering innovation and efficiency opportunities. However, they represent a huge security challenge at many levels in organizations and this should not be overlooked.
By adopting a security-first approach, following best practices and having robust governance, organizations can harness the power of AI and mitigate the emerging threats related to its adoption.
Read more about how we help secure these tools:
Rémy joined Tenable in 2020 as a Senior Research Engineer on the Web Application Scanning Content team. Over the past decade, he led the IT managed services team of a web hosting provider and was responsible for designing and building innovative security services in a Research & Development team. He also contributed to open source security softwares, helping organizations increase their security posture.
Interests outside of work: Rémy enjoys spending time with his family, cooking and traveling the world. Being passionate about offensive security, he enjoys doing ethical hacking in his spare time.