Researchers at the Spark Research Lab (University of Texas at Austin)1, under the supervision of Symmetry CEO Professor Mohit Tiwari uncovered a novel attack method, dubbed ConfusedPilot. This novel attack method targets widely used Retrieval Augmented Generation (RAG) based AI systems, such as Microsoft 365 Copilot. This attack allows manipulation of AI responses simply by adding malicious content to any documents the AI system might reference, potentially leading to widespread misinformation and compromised decision-making processes within the organization.. With 65% of Fortune 500 companies currently implementing or planning to implement RAG-based AI systems, the potential impact of these attacks cannot be overstated.
– Requires only basic access to manipulate responses by RAG based AI Systems.
– Affects all major RAG implementations
– Can persist even after malicious content is removed
– Bypasses current AI security measures
In this document, we provide a high-level overview of ConfusedPilot and its implications for organizations using RAG-based AI systems. Given the widespread and rapid adoption of AI Copilots and the potential sensitivity of this vulnerability, we have chosen to withhold certain technical details and specific exploit information at this time.
ConfusedPilot was discovered and researched by a team of cybersecurity experts and computer scientists (Note 1) from the University of Texas at Austin, under the supervision of Professor Mohit Tiwari, who directs the SPARK lab at UT Austin and is also the CEO of Symmetry Systems.
In normal circumstances, a retrieval Augmented Generation (RAG) based AI system will use a retrieval mechanism to extract relevant keywords to search and match with resources stored in a Vector database, using that embedded context to create a new prompt containing the relevant information to reference.
The researchers demonstrated that due to the architecture used by Retrieval Augmented Generation (RAG) to essentially leverage content as a prompt in certain circumstances, an attacker could indirectly manipulate AI-generated responses by adding content to any documents the AI system might reference, potentially leading to widespread misinformation and compromised decision-making processes within the organization.
An adversary attempting ConfusedPilot attack would likely follow these steps:
While these types of attack could be directed at any organization or individual using RAG-based AI systems, they are especially relevant for large enterprises and service providers that allow multiple users or departments to contribute to the data pool used by these AI systems. Any environment that allows the input of data from multiple sources or users – either internally or from external partners – is at higher risk, given that this attack only requires data to be indexed by the AI Copilots. Essentially the attack surface to introduce harmful data, manipulate the AI’s responses, and potentially use this to target the organization’s decision making tools is increased exponentially as demonstrated by the below image.
The urgency with which you should take steps to defend against these forms of attacks depends on your organization’s use of RAG-based AI systems, the level of trust required and boundaries you place around the data sources used by these systems.
A few illustrative examples:
The disclosure of ConfusedPilot at DEF CON’s AI Village sparked significant industry attention, with Microsoft taking the lead in reaching out to the research team. While the demonstration at DEF CON focused on Microsoft’s Copilot, the researchers emphasized that the attack was possible on RAG architectures broadly, prompting interest from various organizations planning or implementing AI systems. The industry’s response has been notably positive, particularly regarding the team’s approach to not just highlighting the problem but providing practical mitigation strategies using existing tools.
Given the deeply intertwined nature of data and AI copilot’s in RAG architectures, protecting against ConfusedPilot requires a multi-faceted approach that addresses the security of both the data inputs and the AI outputs. Longer term, better architectural models will be required to try to separate the data plan from the control plan in these models. Key current strategies for mitigation include:
The emergence of ConfusedPilot has revealed the inseparable relationship between data and AI security in RAG-based systems. This novel attack technique demonstrates how data poisoning can effectively compromise AI outputs, emphasizing the need to treat data as a core component of AI security architecture. The performance and reliability of AI systems are intrinsically linked to the quality and security of their training data, requiring a fundamental shift in security approaches.
The insider threat vector is particularly significant in this context. ConfusedPilot shows how individuals with even minimal data access can potentially influence AI system outputs. This highlights the limitations of current security models and underscores the need for comprehensive data security posture management.
Addressing these challenges requires an integrated defense strategy centered on robust data security posture management (DSPM) tools. These tools provide critical capabilities such as continuous data discovery, classification, and risk assessment across an organization’s entire data estate. When combined with access controls and employee training, DSPM tools form the cornerstone of a strong defense against data poisoning attacks. Organizations should prioritize implementing DSPM solutions that can monitor data lineage, detect anomalous changes, and ensure the integrity of data feeding into AI systems.
While AI continues to drive innovation, the discovery of ConfusedPilot marks a turning point in how organizations must think about Data+AI security. The integrity of AI systems is intrinsically tied to the integrity of the data they reference. As AI adoption accelerates, ensuring that data is protected from manipulation will be a cornerstone of securing the future of AI-driven business operations.
1 UT AUSTIN Research TEAM
Ayush Roychowdhury (UT Austin) Ayush RoyChowdhury is a first year masters student at the Chandra Department of Electrical and Computer Engineering at the University of Texas Austin. His research interests include language model security, data security, and explainable artificial intelligence for security. |
Mulong Luo (UT Austin) Mulong Luo is currently a postdoctoral researcher at the University of Texas at Austin. His research interests are in computer architecture, side channel, and machine learning. He won best paper award at CPS-SPC workshop. He got his Ph.D. from Cornell University in 2023. |
Prateek Sahu (UT Austin) Prateek Sahu is currently a Ph.D. student at the University of Texas at Austin. His research interests are microservices, service mesh, cloud computing, function-as-a-service measurement. |
Sarbartha Banerjee (UT Austin) Sarbartha Banerjee is currently a Ph.D. candidate at the University of Texas at Austin. His research interests are secure accelerators, side channel defense, machine learning security. |
Mohit Tiwari (Symmetry Systems / UT Austin) Mohit Tiwari is an associate professor and he directs the SPARK lab at the University of Texas at Austin. He is also the CEO of Symmetry Systems, Inc. His current research focuses on building secure systems, all the way from hardware to system software to applications that run on them. Prof. Tiwari received a PhD from UC Santa Barbara (2011) and was a post-doctoral fellow at UC Berkeley (2011-13) before joining UT. |
The post ConfusedPilot: UT Austin & Symmetry Systems Uncover Novel Attack on RAG-based AI Systems appeared first on Symmetry Systems.
*** This is a Security Bloggers Network syndicated blog from Symmetry Systems authored by Claude Mandy. Read the original post at: https://www.symmetry-systems.com/blog/confused-pilot-attack/