Shadow AI: The Murky Threat to Enterprise Adoption of Generative AI

Shadow AI: The Murky Threat to Enterprise Adoption of Generative AI
2024-4-13 01:42:54 Author: securityboulevard.com(查看原文) 阅读量:5 收藏

Generative AI (GenAI) technologies, especially large language models like OpenAI’s GPT-4, continue to generate interest among enterprises eager to gain competitive advantages. Many companies recognize the potential of these technologies to revolutionize various aspects of their operations. However, despite the growing interest, there is a notable hesitance in adopting generative AI within enterprises.

According to Cisco’s 2024 Data Privacy Benchmark Study, data privacy is among the top concerns of enterprises. Not only is it a concern, but it’s a critical imperative to do business.

91% of organizations say they need to be doing more to reassure customers about how their data is being used with AI.
98%of organizations report privacy metrics to their board of directors.
94% of organizations say their customers won’t buy from them if data is not adequately protected.

GenAI puts AI capabilities in the hands of many more users. Ninety-two percent of respondents see it as a fundamentally different technology with novel challenges and concerns that require new techniques to manage data and risk.

In addition, we are witnessing an increase in the number of record-breaking fines imposed on companies worldwide for breaching the trust of their customers. For instance,

In September 2022, Instagram was fined $403 million by Ireland’s Data Protection Commissioner (DPC) for violating children’s privacy under the GDPR.
China’s ride-hailing conglomerate, Didi Global Inc (Didi), was fined RMB 8.026 billion (approximately USD 1.18 billion) for violating cybersecurity and data-related laws.
In the summer of 2021, the financial records of retail giant Amazon disclosed that the Luxembourg authorities had imposed a €746 million ($877 million) fine for GDPR breaches.

The stakes for data privacy have never been higher.

The Rise of Shadow AI

As artificial intelligence continues its relentless march into enterprises, an insidious threat lurks in the shadows that could undermine its widespread adoption: Shadow AI.

Much like the “shadow IT” phenomenon involving unauthorized software usage, Shadow AI refers to deploying or using AI systems without organizational oversight. But it poses far more risks to enterprises.

Whether due to convenience or ignorance, failure to properly govern AI development creates ticking time bombs. As AI becomes more accessible through cloud services while remaining opaque, the backdoors left open by lax controls could easily be exploited for misuse.

Employees eager to gain an edge can easily paste company data into ChatGPT or Google Bard with the best intentions, such as completing work faster and more efficiently. In the absence of a secure solution, employees will turn to the solutions that are accessible.

Last spring, Samsung employees accidentally shared confidential information with ChatGPT on three occasions. The leaked information included software code and a meeting recording, which caused the company to prohibit using GenAI services by its staff.

In addition, with the easy accessibility to GenAI APIs, software developers can easily integrate GenAI into their projects, which can add exciting new features but often at the expense of best security practices.

The Risks of Shadow AI

As pressures mount to take advantage of GenAI, several threats are growing.

Data Leakage

The proliferation of GenAI tools presents a double-edged sword. On one hand, these tools offer remarkable capabilities in enhancing productivity and fostering innovation. On the other, they pose significant risks related to data leakage, especially in the absence of robust AI Acceptable Use Policies (AUPs) and enforcement mechanisms. The ease of access to GenAI tools has led to a concerning trend where employees, driven by enthusiasm or the pursuit of efficiency, might inadvertently expose sensitive corporate data to third-party services.

It’s not just regular knowledge workers using chatbots. Last year, Microsoft employees also made a mistake by accidentally exposing 38 terabytes of LLM training data while uploading it to the developer platform GitHub. This included Microsoft employees’ personal computer backups. The backups contained sensitive personal data, including passwords to Microsoft services, secret keys and over 30,000 internal Microsoft Teams messages from 359 Microsoft employees, according to security company Wiz.

Compliance Violations

Shadow AI tools, not vetted for compliance, can result in violations of regulations like GDPR, leading to legal repercussions and fines. Beyond that, there are a growing number of laws across multiple jurisdictions that companies need to be concerned with.

The soon-to-be-accepted EU Artificial Intelligence Act adds additional complexity. Non-compliance with the rules can lead to fines ranging from 35 million euros or 7% of global turnover to 7.5 million or 1.5 % of turnover, depending on the infringement and company size.

On January 29th, the Italian Data Protection Authority (DPA), known as Garante per la Protezione dei Dati Personali, informed OpenAI that it had violated data protection laws. In March of last year, Garante had temporarily banned OpenAI from processing data. Based on the results of its fact-finding activity, the Italian DPA concluded that the available evidence pointed to breaches of the provisions contained in the EU GDPR.

Shining a Light on Shadow AI

Organizations need a privacy-preserving AI solution that bridges the gap between protecting privacy and realizing LLM’s full potential.

Although there have been significant advancements in AI technology, only some AI-based applications have been successfully implemented by organizations to operate securely on confidential and sensitive data. To protect privacy throughout a generative AI lifecycle, strict data security techniques must be implemented to perform all security-critical operations that touch a model and all confidential data used for training and inference securely and efficiently.

Data sanitization and anonymization are often proposed as methods to enhance data privacy. However, these approaches may not be as effective as hoped. Data sanitization, the process of removing sensitive information from a dataset, can be undermined by the very nature of GenAI.

Anonymization, the process of stripping datasets of personally identifiable information, also falls short in the context of GenAI. Advanced AI algorithms have demonstrated the ability to re-identify individuals in anonymized datasets. For instance, research from Imperial College London revealed that a machine-learning model could re-identify individuals in anonymized datasets with remarkable accuracy. The study found that using just 15 characteristics, such as age, gender and marital status, 99.98% of Americans could be re-identified in any given anonymized dataset.

Furthermore, a study reported by MIT Technology Review emphasizes the ease of re-identifying individuals from anonymized databases, even when the datasets are incomplete or altered. The use of machine-learning models in this context shows that current anonymization practices are insufficient against the capabilities of modern AI technologies.

These findings suggest that policymakers and technologists need to develop more robust privacy-preserving techniques that can keep pace with the advancements in AI, as traditional methods like data sanitization and anonymization no longer suffice in ensuring data privacy in the era of GenAI.

A Better Solution for Data Privacy in GenAI

Privacy-enhancing technologies (PETs) are considered the best solution for safeguarding data privacy in the field of GenAI. By securing data processing and maintaining system functionality, PETs address data sharing, breaches and privacy regulation issues.

Notable PETs include:

Homomorphic encryption: Allows computations on encrypted data, outputting results as if processed on plaintext. Limitations include slower speed and reduced query complexity. Data integrity risks remain.
Secure Multi-Party Computation (MPC): Facilitates multiple parties working on encrypted datasets, protecting data privacy. Drawbacks include performance reduction, especially in LLM training and inference.
Differential privacy: Adds noise to data to prevent user re-identification, balancing privacy and data analysis accuracy. However, it may impact analysis accuracy and doesn’t secure data during computation, requiring a combination with other PETs.

While each of the described techniques offers ways to protect sensitive data, none can ensure the full functionality of the computational power required of generative AI models. However, a new approach called confidential computing uses hardware-based trusted execution environments (TEEs) that prevent unauthorized access or modification of applications and data while in use. This prevents unauthorized entities, such as the host operating system, hypervisor, system administrators, service providers, infrastructure owners or anyone with physical access to the hardware, from viewing or altering the data or code within the environment. Such hardware-based technologies provide a secure environment to ensure the safety of sensitive data.

Confidential Computing as a Privacy-Preserving AI Solution

Confidential computing is an emerging standard in the technology industry, focusing on protecting data during its use. This concept extends the data protection beyond data at rest and in transit to include data in use, which is particularly relevant in today’s computing environment that spans multiple platforms, from on-premises to cloud and edge computing.

This technology is crucial for organizations handling sensitive data, such as personally identifiable information (PII), financial data or health information, where threats targeting the confidentiality and integrity of data in system memory are a significant concern.

The Confidential Computing Consortium (CCC), a project community at the Linux Foundation, plays a central role in defining and accelerating the adoption of confidential computing. The CCC brings together hardware vendors, cloud providers, and software developers to foster the development of TEE technologies and standards.

This cross-industry effort is essential due to the complex nature of confidential computing, which involves significant hardware changes and how programs, operating systems, and virtual machines are structured. Various projects under the CCC umbrella are advancing the field by developing open-source software and standards, which are crucial for developers working on securing data in use.

Confidential computing can be implemented in different environments, including public clouds, on-premise data centers, and distributed edge locations. This technology is vital for data privacy and security, multi-party analytics, regulatory compliance, data localization, sovereignty and residency. It ensures that sensitive data remains protected and compliant with local laws, even in a multi-tenant cloud environment.

The Ultimate Goal: Confidential AI

A confidential AI solution is a secure platform that uses hardware-based trusted execution environments (TEEs) to train and operate machine learning models on sensitive data. The TEEs enable training, finetuning and inferencing without exposing sensitive data or proprietary models to unauthorized parties.

Data owners and users can use local learning models (LLMs) on their data without revealing confidential information to unauthorized parties. Similarly, model owners can train their models while safeguarding their training data and model architecture and parameters. In the event of a data breach, hackers can only access encrypted data, not the sensitive data protected within TEEs.

However, confidential computing alone cannot prevent models from accidentally revealing details about the trained data. Confidential computing technology can be combined with differential privacy to mitigate this risk. This approach involves computing data within TEEs and applying a differential privacy update before releasing it, reducing the risk of leakage during inferencing.

Moreover, a confidential AI platform helps LLM and data providers comply with privacy laws and regulations. By safeguarding confidential and proprietary data using advanced encryption and secure TEE technology, model builders and providers have fewer concerns about the amount and type of user data they can collect.

Confidential computing technologies such as trusted execution environments provide the foundation for preserving privacy and intellectual property in AI systems. Confidential AI solutions allow more organizations to benefit from AI while building stakeholder trust and transparency when combined with techniques like differential privacy and thoughtful data governance policies.

Though much work still needs to be done, advancements in cryptography, secure hardware and privacy-enhancing methods point towards a future where AI can be deployed ethically. Still, we must continue advocating for responsible innovation and pushing platforms to empower individuals and organizations to control how their sensitive data is used.

文章来源: https://securityboulevard.com/2024/04/shadow-ai-the-murky-threat-to-enterprise-adoption-of-generative-ai/
如有侵权请联系:admin#unsafe.sh