UL NO. 456: A Deep-dive on Prompt Injection

UL NO. 456: A Deep-dive on Prompt Injection
SECURITY | AI | PURPOSEUNSUPERVISED LEARNING is a newsletter about upgrading to thrive in a world fu 2024-10-29 22:15:0 Author: danielmiessler.com(查看原文) 阅读量:41 收藏

SECURITY | AI | PURPOSE
UNSUPERVISED LEARNING is a newsletter about upgrading to thrive in a world full of AI. It’s original ideas, analysis, mental models, frameworks, and tooling to prepare you for the world that’s coming.

SECURITY

Apple is offering $1,000,000 to hack its Private Cloud Compute (PCC) system, which is its new, proprietary cloud system it built to handle Apple Intelligence requests that can’t be done on-device. MORE

🧠A New Way to Think About Why Security Awareness Doesn’t Work
💡Had an absolutely brilliant conversation with Cornelia Puhze at the Swiss Cyberstorm speaker dinner. She’s an expert on security awareness and we talked about why most programs don’t work, and her premise was that the only model that will work is something that interrupts System 1 thinking and gets us a chance with System 2.

🤯

In other words, the attacks are getting so good that you’re not thinking—you’re reacting. So all the traditional training in the world won’t help you because you’re not in the mindset where training CAN work. And this only gets worse with AI-written spearphishing that’s perfectly targeted to your personality flaws.

We talked about how the only defense is something like Dialectical Behavior Therapy and similar techniques—that teach you how to PAUSE when you become excited or anxious or stressed or whatever. Which is fascinatingly and strangely related to mindfulness.

Anyway, just love this concept so much because it cleanly explains why security awareness training fails so spectacularly, and hints at a new way of training that could work. Go follow Cornelia’s work.

—

This is a SUPER cool demo but I’m not sure I’d classify it as prompt injection.
The issue is that the instruction on the site is to run a program. And Computer Use is designed to follow instructions.
So the demo is showing that computers will follow dangerous instructions.
— ᴅᴀɴɪᴇʟ ᴍɪᴇssʟᴇʀ (@DanielMiessler)
10:14 AM • Oct 25, 2024

If you go through the whole thread it all comes down to definitions—as usual. My point was that if you tell an AI agent to eat poison—and it eats it and gets hurt—that’s NOT prompt injection. It’s a direct instruction followed by an agent.

So my take was that if you tell an agent to go to a website and download an executable and execute it—that’s the same. It’s like telling your computer to rm -rf. It’ll do it. And that’s not injection, it’s just a dangerous command.

But what’s super important here is WHO is asking for a given thing to happen, and what they EXPECTED would happen. You have to look at the implied goal of the REQUESTOR, and compare THAT to what ACTUALLY happens.

So if the requestor said:

Go execute commands on this possibly dangerous website.

That would not be prompt injection because it was just following commands.

What I missed in this particular case was that the initial command sent to the tool wasn’t to go and do what was on the website, but to just load the site. So the implied expectation of the REQUESTOR was normal browsing—not downloads and executions. So, given my definition above, and this initial setup—I’d call myself wrong about my original take.

Here’s the definition I have in my Real World AI Defintiions now, updated to magnify the importance of this wrinkle. And great research by Johann Rehberger!

Prompt Injection is an attack technique that uses specially crafted input to trick an AI into doing something that violates intent/expectation and leads to a negative outcome.

VMware has released updates for vCenter Server to fix a critical remote code execution vulnerability, CVE-2024-38812, with a CVSS score of 9.8. MORE

The Biden administration released the first National Security Memorandum on AI. I love its focus on not losing to China, and making sure it’s safe, secure, and trustworthy. It also focused a lot on being aligned with democratic (small d) values. MORE | THE MEMORANDUM

Fortinet has disclosed a critical vulnerability, CVE-2024-47575, in FortiManager, actively exploited in the wild. Known as FortiJump, this flaw allows remote code execution via the FGFM protocol and affects FortiManager and FortiAnalyzer models. MORE

Salt Typhoon (China affiliated) is suspected of breaching major telecom companies, targeting American political figures like Kamala Harris, Charles Schumer, Donald Trump, and J.D. Vance. MORE

TSMC has stopped doing business with a client after finding out that chips were being sent to Huawei, which is under US sanctions. The whole game for China now is to find proxies to buy through, or to use services like AWS that can hook up NVIDIA chips. MORE

Russia amplified false claims about U.S. hurricane responses to manipulate political discourse before the presidential election, according to the Institute for Strategic Dialogue. MORE

Both US parties are worried about last-minute deepfakes that create chaos and/or move the election. MORE

Speaking of that 👆🏼, the FBI says Russian actors created a fake video showing mail-in ballots for Trump being destroyed in Pennsylvania. MORE

AI / TECH

Google is working on "Project Jarvis," an AI agent for Chrome that automates web tasks like research and booking flights. Powered by Gemini 2.0, Jarvis takes screenshots to interpret and act on tasks. MORE

💡This will be Google’s first move into the all-seeing digital assistant space, and I like to see it only because it will increase pressure on everyone to release theirs.

But I think this implementation is short-sighted due to it being browser-based. They really need “Jarvis” to live deeply in the OS, which is where Apple be heading soon.

World models, or world simulators, are emerging as a significant path for developing AI, and I’m really excited about the direction. MORE

💡I personally feel (as a non-expert in the weeds) that there will be a certain point of world model development (combined with post-training) that will unlock both AGI and ASI—although it might not be needed for AGI.

In other words, if an AI understands enough of how the world works, and it understands how to do science (conjecture, experiment design, and testing), that might be all it needs.

Plus, even if it’s not, it’s also the path to self-improvement.

TSMC's Phoenix chip plant is outperforming its Taiwan facilities in producing usable chips, according to a company executive on a webinar. Let’s go in-country production! MORE

Tesla's Cybertruck is outselling nearly every other electric vehicle in the US. That was quick. Like two months ago they were a laughing stock. MORE

Waymo just raised $5.6 billion in a Series C to expand to new cities. MORE

Determinate Systems is trying to make Nix is the go-to for software development by enabling flakes, streamlining private repositories, and improving dependency management. MORE

💡Dammit. These people are going to make me learn Nix aren’t they?

It’s hit my radar enough in the last year that I’m going to take a few days and learn the religion.

NASDAQ CEO Adena Friedman isn't shocked that startup IPOs haven't bounced back in 2024. She says while the S&P 500 is up 22%, it's mainly due to large-cap companies like Apple and Microsoft, while small-cap companies are struggling. MORE

HUMANS

Researchers have traced 70% of meteorites to three major collisions in the asteroid belt over the last 40 million years. MORE

The US economy is leading the G7 with a projected 2.8% GDP growth. US workers are more productive, generating $171,000 in goods and services annually, compared to $120,000 in Europe and $96,000 in Japan. MORE

Elon Musk has reportedly been in regular contact with Russian President Vladimir Putin since late 2022, which is highly disturbing to me. Probably unrelated, but Elon has seemed a lot less supportive of Ukraine lately. 👎🏼MORE

Russian lawmakers have ratified a pact with North Korea for mutual military assistance and 3,000 North Korean troops have been deployed to Russia. And South Korea is thinking about sending help to Ukraine as a result. MORE | MORE

Character amnesia is becoming a widespread issue in China, where even well-educated individuals are forgetting how to write common Chinese characters. MORE

A study in Alzheimer's & Dementia suggests semaglutide, found in Ozempic and Wegovy, may lower Alzheimer's risk in Type 2 diabetes patients. The research compared semaglutide to seven other diabetes drugs and found a 70% lower Alzheimer's risk compared to insulin. MORE

Walking in short bursts can burn 20-60% more energy compared to continuous walking over the same distance. MORE

DISCOVERY

My friend Matt Johansen highlights the psychological toll of working in security (especially in SOCs), including decision fatigue, anxiety, and sleep disruptions. MORE

Google just launched a new 10-hour course called Prompting Essentials to help people write better AI prompts. MORE

An Ode To Vim MORE

PabloNet — A wall-mounted diffusion mirror turns webcam reflections into AI-generated paintings using StreamDiffusion. The setup includes a Raspberry Pi 5, a 10.1" Pi screen, infrared light, and a Pi camera, all housed in a generic frame. MORE

Japan has introduced a digital nomad visa, and Christian Mack shared his experience of getting one. MORE

IRIS — A new approach called IRIS combines large language models (LLMs) with static analysis to detect security vulnerabilities in software. Using a dataset called CWE-Bench-Java, IRIS detected 69 out of 120 vulnerabilities in Java projects, outperforming traditional static analysis tools that found only 27. MORE

School is Not Enough: Learning is a consequence of doing MORE

llm-whisper-api — Simon Willison created a quick plugin for LLM to experiment with the OpenAI Whisper API. You can install it using llm install llm-whisper-api and run it with llm whisper-api myfile.mp3. MORE

simpletext — A text-only blog engine using Cloudflare Workers and KV store. It's designed to be lightweight and efficient, leveraging Cloudflare's infrastructure for hosting and data storage. MORE

The Most Important Sentence MORE

One of the weirdest features of the web I know of—text fragments let you link directly to specific text on a webpage without needing an anchor, using a special URL syntax. It even highlights the text when you land on the link. MORE

RECOMMENDATION OF THE WEEK

The counterforce to election stress is reading some older good books. Here’s a great list to choose from.

1. Gödel, Escher, Bach: An Eternal Golden Braid by Douglas Hofstadter

2. Zen and the Art of Motorcycle Maintenance by Robert M. Pirsig

3. The Book: On the Taboo Against Knowing Who You Are by Alan Watts

4. The Structure of Scientific Revolutions by Thomas S. Kuhn

5. Finite and Infinite Games by James P. Carse

6. Seeing Like a State by James C. Scott

7. The Spell of the Sensuous by David Abram

8. Ishmael by Daniel Quinn

9. Mind and Nature: A Necessary Unity by Gregory Bateson

10. Small Is Beautiful: Economics as if People Mattered by E.F. Schumacher

APHORISM OF THE WEEK

❝

What you don’t change, you choose.

Laurie Buchanan

Thank you for reading. Please forward to a friend and/or share on socials to help support the work.

🫶🏼

Daniel