As highlighted in our guide to web scraping with Puppeteer, this browser automation library is a fantastic ally for extracting data from dynamic content sites. Still, like any other tool, it has its shortcomings. That's where Puppeteer Extra steps in!
In this guide, we’ll introduce you to puppeteer-extra
—a library that wraps puppeteer
to extend it with plugin support. Get ready to take your Puppeteer scraping project to the next level! 🚀
Puppeteer Extra is a lightweight wrapper around puppeteer
that enables plugin integration through a clean interface. Although it's not developed by the team behind Puppeteer, this community-driven project has hundreds of thousands of weekly downloads and over 6k stars on GitHub 📈.
Check out the GitHub stars chart below—it’s clear that the puppeteer-extra
repo has been on a steady rise in popularity over the years:
The plugins officially supported by Puppeteer Extra are:
User-Agent
header on all pages, with support for dynamic replacing.On top of those, it integrates with the following community plugins:
No doubt, Puppeteer is one of the top headless browser libraries for scraping and testing. But let’s be honest—it has its limits, especially when facing anti-bot tech like browser fingerprinting and CAPTCHAs. Read our guide to learn how to deal with reCAPTCHA automation.
Websites armed with anti-bot defenses can easily detect and block Puppeteer scripts. If only there was a way to extend and customize Puppeteer’s default behavior...
…well, that’s exactly what Puppeteer Extra is all about!
Puppeteer Extra is like a power-up for Puppeteer, adding plugin support to tackle those major drawbacks. Instead of overriding or extending everything for you, it wraps Puppeteer and lets you register only the plugins you need. 🦸
puppeteer-extra
: Setup and Plugins for Web ScrapingYou can add Puppeteer Extra to your project's npm dependencies with:
npm install puppeteer-extra
⚠️ Note: puppeteer-extra
requires puppeteer
to work, so make sure both packages are installed in your project.
Then, you have to import the puppeteer
object from puppeteer-extra
instead of the puppeteer
library:
const puppeteer = require("puppeteer-extra")
// for ESM users:
// const { puppeteer } from "puppeteer-extra"
Everything in the Puppeteer API stays the same, but you get a little extra magic ✨. The puppeteer
object now exposes a use()
method to plug in Puppeteer Extra plugins.
Time to dive into what these plugins can do, and see how they’ll level up your web scraping game!
Puppeteer Extra Plugin Stealth, also known simply as Puppeteer Stealth, includes a set of configurations designed to reduce bot detection. It overrides Puppeteer’s detectable properties and settings that might expose it as a bot.
For more details, check out our guide on how to avoid getting blocked with Puppeteer Stealth.
⚙️ Installation:
npm install puppeteer-extra-plugin-stealth
💡 Usage:
const StealthPlugin = require("puppeteer-extra-plugin-stealth")
// for ESM users:
// import StealthPlugin from "puppeteer-extra-plugin-stealth"
puppeteer.use(StealthPlugin())
A plugin to prevent the Puppeteer browser from loading specific resources. The supported resource types include document
, stylesheet
, image
, media
, font
, script
, texttrack
, xhr
, fetch
, eventsource
, websocket
, manifest
, other
.
Resource blocking can be configured both globally and locally.
⚙️ Installation:
npm install puppeteer-extra-plugin-block-resources
💡 Usage:
const BlockResourcesPlugin = require("puppeteer-extra-plugin-block-resources")
// for ESM users:
// import BlockResourcesPlugin from "puppeteer-extra-plugin-block-resources"
You can then configure the resources to block globally on all pages:
puppeteer.use(BlockResourcesPlugin({
blockedTypes: new Set(["image", "stylesheet"]),
}))
Similarly, you can locally select the resources to be blocked:
puppeteer.use(BlockResourcesPlugin()
const browser = await puppeteer.launch()
const page = await browser.newPage()
blockResourcesPlugin.blockedTypes.add("stylesheet")
await page.goto("https://www.example.com/", { waitUntil: "domcontentloaded" })
A plugin to anonymize the User-Agent
set by the browser controlled by Puppeteer. 🎭
It gives you the ability to strip the 'Headless'
string from the Chrome user agent in headless mode and supports dynamic replacement of the user agent through a custom function. See it in action in our Puppeteer user agent guide.
Discover what’s the best user agent for web scraping!
⚙️ Installation:
npm install puppeteer-extra-plugin-anonymize-ua
💡 Usage:
const AnonymizeUAPlugin = require("puppeteer-extra-plugin-anonymize-ua")
// for ESM users:
// import AnonymizeUAPlugin from "puppeteer-extra-plugin-anonymize-ua"
Next, you can configure the anonymous user agent:
puppeteer.use(AnonymizeUAPlugin({
stripHeadless: true,
}))
Also, you can set a dynamic user agent via a custom function:
puppeteer.use(AnonymizeUAPlugin({
customFn: (ua) => ua.replace("Chrome", "Chromium")})
}))
Just like with Playwright, no matter how slick and customized your Puppeteer script is, advanced anti-bot systems can still sniff you out and shut you down. But how is that even possible? 🤔
The puppeteer-extra-stealth-plugin
documentation breaks it down for you:
Please note: I consider this a friendly competition in a rather interesting cat and mouse game. If the other team (👋) wants to detect headless chromium there are still ways to do that (at least I noticed a few, which I'll tackle in future updates).
It's probably impossible to prevent all ways to detect headless chromium, but it should be possible to make it so difficult that it becomes cost-prohibitive or triggers too many false-positives to be feasible.
So, while Puppeteer Extra can dodge most basic bot detection like Neo in Matrix, it can’t surely bypass Cloudflare. Sure, you could integrate a proxy into Puppeteer, but even that might not be enough.
The problem isn't Puppeteer itself (because let's be real, Puppeteer rocks! 🤘), but the browser it's controlling. The real solution? A powerful browser that:
Believe it or not, this isn't some distant dream. It's real, and it's exactly what Bright Data’s Scraping Browser has to offer!
Puppeteer is one of the most widely used browser automation tools in the tech world, but even superheroes have their limits. The community stepped in with puppeteer-extra
, a package that gives Puppeteer some seriously cool new abilities through custom plugins.
But here's the thing: while these plugins can make your scraping operation way stronger, they won’t magically turn you into a ghost 👻. Sites with advanced bot detection might still be able to block you!
Bypass all anti-bots with Bright Data’s Scraping Browser—a non-detectable cloud browser that integrates seamlessly with Puppeteer. Join our mission to make the Web a public space for everyone, everywhere, even through automated scripts.
Until next time, keep exploring the Internet with freedom! 🌐