Here’s How we Made a Real-time Phishing Website Detector for MacOS
2024-8-26 15:53:11 Author: hackernoon.com(查看原文) 阅读量:5 收藏

This real-time, on-device antiphishing solution for macOS takes reference-based detection to a new level, instantly warning Mac users they are on a phishing website.

First, background

How many unique phishing websites were published in 2023? The Antiphishing Working Group counted almost 5 million. At the beginning of 2024, MacPaw's cybersecurity division Moonlock reported about the AMOS stealer relying on fake websites of trustworthy brands to spread malware on Apple computers. Not only do they infect our devices, but they also collect victims' credentials for malicious purposes. Spoofed websites are dangerous, so my team and I decided to do something about them.

The solution I describe below started as a proof-of-concept experiment at Moonlock. We ironed down the wrinkles at MacPaw's Technological R&D and presented the working prototype at STAST 2024. Our position paper described the solution in detail and was originally uploaded on arXiv.org. For a sneak peek under the hood of our antiphishing app, please read on.

What do we have on our hands as of now?

Current antiphishing apps primarily use three detection methods: blacklisting, classification-based approach, and reference-based approach. Each method has its advantages, but all require further improvements. Let's explore each of them.

Blacklisting

The blacklist approach is practical and accurate, but it can't keep up with how quickly phishing websites spread. It's not always effective since new phishing websites might still need to be added to the list, while attackers often change URLs to dodge detection.

For instance, Google Safe Browsing uses lists of known phishing sites. When you try to visit a website, it checks the address against this list. If there's a match, it blocks access and warns you about the danger. But what if the website was published mere minutes ago? It won't be on the list, and the user will be trapped.

Classification-based approach

In this antiphishing method, machine learning analyzes webpage features like URL structures, HTML content, and metadata to determine whether a website is spoofed or legitimate. Classification is excellent for browser extensions because it learns from user data to spot new phishing sites.

The disadvantage here is that machine learning requires complex algorithms and lots of training data, while cybercriminals swiftly invent new obfuscation tactics to evade detection. This makes classification-based approaches less accurate and not ideal for standalone security products.

Reference-based approach

Some of the reference-based solutions are considered state-of-the-art. They use computer vision to analyze webpage appearances and effectively detect phishing websites. What we also see, however, is that reference-based solutions could be faster if they weren’t processing phishing cases in cloud.

There's a critical time gap between a phishing website going live and the reference-based detection systems adding it to the list. We wanted to shrink this gap to ensure quicker detection and response.

How our native macOS antiphishing app works

Our goal was to warn Mac users about phishing websites as soon as they go live. To achieve this, we took the reference-based approach and improved it. We eliminated cloud processing and suggested to do all computations locally, aiming to cut detection time. As a bonus, our solution enhances privacy since all user data is processed on the device and doesn't go anywhere else.

We built a native macOS app using Swift, incorporating frameworks for screen capturing and machine learning. By converting our models to CoreML format, we ensured smooth performance and minimized the use of system resources. This way, our prototype continuously scans webpages in the background, protecting Mac users from phishing websites without requiring extra interactions.\The prototype works independently from browsers. The macOS Accessibility framework and Accessibility metadata help the app focus on certain regions of interest so it knows where to look for phishing.

Here is how it works in a nutshell.

First step: webpage analysis

When on a website, our app tries to understand the page layout. It identifies key page elements like logos, input fields, and buttons. For this task, we chose DETR with ResNet-50 because of its accuracy and performance.

In this step, it's important to recognize the placements of the elements on the website, particularly the area with a brand logo and forms for entering credentials.

Second step: brand attribution

Next, the prototype checks if a detected logo on the website matches any well-known brands. On top of it, it compares the webpage URL against a reference list of legitimate websites. If the website is official, we skip further steps.

On a side note, we were dismayed to see how many official domains brands use for marketing. It's no wonder phishing websites are so effective at tricking their victims. For example, DHL has several official domains like dhl.com, express.dhl, mydhli.com, dhlsameday.com, and dhlexpresscommerce.com.

Third step: prevent credential harvesting

We classify the webpage into two categories: whether it requires credentials or not. This step verifies if a phishing website is trying to steal personal user information.

In the screenshot, our prototype found credential input fields, attributed the page to DHL, and checked the URL against the list of official DHL domains. The user got a phishing warning since the page does not belong to DHL.

How accurate is the prototype?

Our system maintains or surpasses baseline accuracy and surely has faster processing times. We achieved a 90.8% accuracy in logo recognition and 98.1% in detecting credential input.

The graph below showcases our performance against other antiphishing solutions, and how we compare in precision, recall, and false positive rate. We proudly detected 87.7% of phishing attempts while keeping the false positive rate at a low 3.4%.

It's fast and smooth, too

The final metrics demonstrate that our solution runs smoothly in the background without a noticeable loss of performance. The use of CPU is minimal: with eight cores in Apple M1 Mac, our prototype uses just 16% of the available 800% capacity. This consumption level is similar to three active Safari tabs or one Zoom call.

Final thoughts

There are plenty of antiphishing apps on the market, but most of them process data on external servers. Our prototype shows that hardware on modern computers allows us to bring machine learning models locally on-device. We can use them to combat phishing and not worry about processing speeds and the use of system resources. Fortunately, the Apple ecosystem provides frameworks and tools for optimization.

Author: Ivan Petrukha, Senior Research Engineer at MacPaw Technological R&D, ex-Moonlock.


文章来源: https://hackernoon.com/heres-how-we-made-a-real-time-phishing-website-detector-for-macos?source=rss
如有侵权请联系:admin#unsafe.sh