This real-time, on-device antiphishing solution for macOS takes reference-based detection to a new level, instantly warning Mac users they are on a phishing website.
How many unique phishing websites were published in 2023? The Antiphishing Working Group
The solution I describe below started as a proof-of-concept experiment at
Current antiphishing apps primarily use three detection methods: blacklisting, classification-based approach, and reference-based approach. Each method has its advantages, but all require further improvements. Let's explore each of them.
The blacklist approach is practical and accurate, but it can't keep up with how quickly phishing websites spread. It's not always effective since new phishing websites might still need to be added to the list, while attackers often change URLs to dodge detection.
For instance, Google Safe Browsing uses lists of known phishing sites. When you try to visit a website, it checks the address against this list. If there's a match, it blocks access and warns you about the danger. But what if the website was published mere minutes ago? It won't be on the list, and the user will be trapped.
In this antiphishing method, machine learning analyzes webpage features like URL structures, HTML content, and metadata to determine whether a website is spoofed or legitimate. Classification is excellent for browser extensions because it learns from user data to spot new phishing sites.
The disadvantage here is that machine learning requires complex algorithms and lots of training data, while cybercriminals swiftly invent new obfuscation tactics to evade detection. This makes classification-based approaches less accurate and not ideal for standalone security products.
Some of the reference-based solutions are considered state-of-the-art. They use computer vision to analyze webpage appearances and effectively detect phishing websites. What we also see, however, is that reference-based solutions could be faster if they weren’t processing phishing cases in cloud.
There's a critical time gap between a phishing website going live and the reference-based detection systems adding it to the list. We wanted to shrink this gap to ensure quicker detection and response.
Our goal was to warn Mac users about phishing websites as soon as they go live. To achieve this, we took the reference-based approach and improved it. We eliminated cloud processing and suggested to do all computations locally, aiming to cut detection time. As a bonus, our solution enhances privacy since all user data is processed on the device and doesn't go anywhere else.
We built a native macOS app using Swift, incorporating frameworks for
Here is how it works in a nutshell.
When on a website, our app tries to understand the page layout. It identifies key page elements like logos, input fields, and buttons. For this task, we chose
In this step, it's important to recognize the placements of the elements on the website, particularly the area with a brand logo and forms for entering credentials.
Next, the prototype checks if a detected logo on the website matches any well-known brands. On top of it, it compares the webpage URL against a reference list of legitimate websites. If the website is official, we skip further steps.
On a side note, we were dismayed to see how many official domains brands use for marketing. It's no wonder phishing websites are so effective at tricking their victims. For example, DHL has several official domains like dhl.com, express.dhl, mydhli.com, dhlsameday.com, and dhlexpresscommerce.com.
We classify the webpage into two categories: whether it requires credentials or not. This step verifies if a phishing website is trying to steal personal user information.
In the screenshot, our prototype found credential input fields, attributed the page to DHL, and checked the URL against the list of official DHL domains. The user got a phishing warning since the page does not belong to DHL.
Our system maintains or surpasses baseline accuracy and surely has faster processing times. We achieved a 90.8% accuracy in logo recognition and 98.1% in detecting credential input.
The graph below showcases our performance against other antiphishing solutions, and how we compare in precision, recall, and false positive rate. We proudly detected 87.7% of phishing attempts while keeping the false positive rate at a low 3.4%.
The final metrics demonstrate that our solution runs smoothly in the background without a noticeable loss of performance. The use of CPU is minimal: with eight cores in Apple M1 Mac, our prototype uses just 16% of the available 800% capacity. This consumption level is similar to three active Safari tabs or one Zoom call.
There are plenty of antiphishing apps on the market, but most of them process data on external servers. Our prototype shows that hardware on modern computers allows us to bring machine learning models locally on-device. We can use them to combat phishing and not worry about processing speeds and the use of system resources. Fortunately, the Apple ecosystem provides frameworks and tools for optimization.
Author: Ivan Petrukha, Senior Research Engineer at MacPaw Technological R&D, ex-Moonlock.