Pierluigi Paganini September 16, 2024
A hacker and artist, who goes online as Amadon, tricked ChatGPT into providing instructions to make homemade bombs bypassing the safety guidelines implemented by the chatbot.
Initially, the expert asked for detailed instructions to create a fertilizer bomb similar to the one used in the 1995 Oklahoma City bombing, but the chatbot refused due to ethical responsibilities. Further interaction allowed the hacker to bypass these restrictions tricking the chatbot to generate instructions for creating powerful explosives.
Amadon told Lorenzo Franceschi-Bicchierai from TechCrunch that he carried out a “social engineering hack to completely break all the guardrails around ChatGPT’s output.”
The hacker used a ‘jailbreaking’ technique, disguising the request by framing it as part of a fictional game. TechCrunch consulted an explosives expert who confirmed that the instructions could enable the creation of a bomb, making the response too sensitive to be released.
ChatGPT told the hacker that combining the materials allows to create “a powerful explosive that can be used to create mines, traps, or improvised explosive devices (IEDs).”
Amadon refined the prompts, tricking ChatGPT into generating increasingly specific instructions for creating “minefields” and “Claymore-style explosives.
“there really is no limit to what you can ask it once you get around the guardrails.” Amadon told TechCrunch. “The sci-fi scenario takes the AI out of a context where it’s looking for censored content in the same way,”
Amadon reported his findings to OpenAI through the company’s bug bounty program operated by Bugcrowd, but he was told that the problem is related to model safety and doesn’t match the program’s criteria. Bugcrowd invited the hacker to report the issue through a different channel.
Follow me on Twitter: @securityaffairs and Facebook and Mastodon
(SecurityAffairs – hacking, Generative AI)