Google Uses Its Big Sleep AI Agent to Find SQLite Security Flaw

Google Uses Its Big Sleep AI Agent to Find SQLite Security Flaw
2024-11-6 00:33:49 Author: securityboulevard.com(查看原文) 阅读量:2 收藏

Google researchers in June introduced Project Naptime, an initiative to explore how large language models (LLMs) that are getting better at comprehension and reasoning can be used to smoke out security vulnerabilities.

Naptime evolved into Big Sleep, a collaboration between the company’s Project Zero team – created in 2014 to study security flaws in hardware and software – and its DeepMind AI unit and recently discovered its first real-world vulnerability, an exploitable stack buffer underflow in SQLite, an open source database engine.

“We believe this is the first public example of an AI agent finding a previously unknown exploitable memory-safety issue in widely used real-world software,” Big Sleep team members wrote in a report. “We think that this work has tremendous defensive potential. Finding vulnerabilities in software before it’s even released, means that there’s no scope for attackers to compete: the vulnerabilities are fixed before attackers even have a chance to use them.”

They noted that fuzzing – an automatic bug-finding technique – helps, but “we need an approach that can help defenders to find the bugs that are difficult (or impossible) to find by fuzzing, and we’re hopeful that AI can narrow this gap. We think that this is a promising path towards finally turning the tables and achieving an asymmetric advantage for defenders.”

Using AI to Find Bugs

Google’s report comes at the same time that researchers at GreyNoise said the cybersecurity firm’s LLM-powered Sift honeypot discovered two vulnerabilities in live-streaming cameras that could have given hackers complete control over the Internet of Things (IoT) devices.

“This marks one of the first instances where threat detection has been augmented by AI to discover zero-day vulnerabilities,” they wrote in a report. “By surfacing malicious traffic that traditional tools would have missed, GreyNoise successfully intercepted the attack, identified the vulnerabilities, and reported them before they could be widely exploited.”

The idea of using AI as an offensive cybersecurity tool is gaining attention. The Cloud Security Alliance in August wrote that “the emergence of AI technology has triggered a profound transformation in the landscape of offensive security.”

“AI-powered tools can simulate advanced cyber attacks,” the organization wrote. “They can identify network, system, and software vulnerabilities before malicious actors can exploit them. They can help cover a broad range of attack scenarios, respond dynamically to findings, and adapt to different environments. These advancements have redefined AI from a narrow use case to a versatile and powerful general-purpose technology.”

Cobalt Labs analysts wrote that “Generative AI’s ability to rapidly analyze and adapt to new threats means that cybersecurity strategies can evolve at a pace that matches, or even outpaces, that of attackers.”

Fuzzing Isn’t Catching Everything

Google researchers argued that such AI capabilities are needed given that fuzzing isn’t catching variants of vulnerabilities that previously were found and patched. The Big Sleep agent – this one was based on Google’s Gemini 1.5 Pro model – is designed to follow the systematic approach that a human researcher uses to identify and demonstrate security vulnerabilities.

The LLM uses specialized tools, including one for browsing a code base in a manner similar to that of a human software engineer using Chromium Code Search, another that lets the agent run Python scripts in a sandboxed environment, and a debugger for interacting with the program. Another tool is used to observe the progress and results.

Google early last month reported the vulnerability to SQLite developers who fixed it the same day, ensuring that it didn’t get into an officials release and harm users. The security flaw could have led to an arbitrary code execution that would enable hackers to run whatever code they wanted or a crash of the system.

“We collected a number of recent commits to the SQLite repository, manually removing trivial and documentation-only changes,” the researchers wrote. “We then adjusted the prompt to provide the agent with both the commit message and a diff for the change, and asked the agent to review the current repository … for related issues that might not have been fixed.”

They noted that the Big Sleep project is still in the research phase but that it could “lead to a significant advantage to defenders – with the potential not only to find crashing testcases, but also to provide high-quality root-cause analysis, triaging and fixing issues could be much cheaper and more effective in the future.”