
Puppeteer is a Node library that we can use to control a headless Chrome instance. We are basically using Chrome, but programmatically using JavaScript.

Puppeteer是一个Node库,我们可以使用它来控制无头Chrome实例。 我们基本上是使用Chrome,但是以编程方式使用JavaScript。

Using it, we can:


  • scrape web pages


  • automate form submissions自动化表单提交
  • perform any kind of browser automation执行任何类型的浏览器自动化
  • track page loading performance跟踪页面加载效果
  • create server-side rendered versions of single page apps创建单页应用程序的服务器端呈现版本
  • make screenshots制作屏幕截图
  • create automating testing创建自动化测试
  • generate PDF from web pages从网页生成PDF

It’s built by Google. It does not unlock anything new, per se, but it abstracts many of the nitty-gritty details we would have to deal with, without using it.

它是由Google构建的。 它本身并没有解锁任何新内容,但是它抽象了许多我们必须处理的细节,而无需使用它们。

In short, it makes things very easy.


Since it spins up a new Chrome instance when it’s initialized, it might not be the most performant. It’s the most precise way to automate testing with Chrome though, since it’s using the actual browser under the hood.

由于初始化时会启动一个新的Chrome实例,因此它可能不是性能最高的。 不过,这是使用Chrome自动进行测试的最精确方法,因为它使用的是实际的内部浏览器

To be precise, it uses Chromium the open source part of Chrome, which mostly means you don’t have the proprietary codecs that are licensed by Google and can’t be open sourced (MP3, AAC, H.264..) and you don’t have the integration with Google services like crash reports, Google update and more, but from a programmatic standpoint it should all be 100% similar to Chrome (except for media playing, as noted).

确切地说,它使用Chromium(Chrome的开源部分),这主要意味着您没有Google许可的专有编解码器,并且无法开源(MP3,AAC,H.264 ..),因此您没有与崩溃报告,Google更新等Google服务集成,但从编程角度来看,它们应该与Chrome 100%相似(媒体播放除外,如前所述)。

安装木偶 (Installing Puppeteer)

Start by installing it using


npm install puppeteer

in your project.


This will download and bundle the latest version of Chromium.


You can opt to make puppeteer run the local installation of Chrome you already have installed by installing puppeteer-core instead, which is useful in some special cases (see puppeteer vs puppeteer-core). Usually, you’d just go with puppeteer.

您可以通过安装puppeteer-core来选择让puppeteer运行已经安装的Chrome的本地安装,这在某些特殊情况下非常有用(请参阅puppeteer与puppeteer-core )。 通常,您只会和puppeteer一起去。

使用木偶 (Using Puppeteer)

In a Node.js file, require it:


const puppeteer = require('puppeteer');

then we can use the launch() method to create a browser instance:


(async () => {const browser = await puppeteer.launch()

We can write like this, too:


puppeteer.launch().then(async browser => {//...

You can pass an object with options to puppeteer.launch(). The most common one is

您可以将带有选项的对象传递给puppeteer.launch() 。 最常见的是

puppeteer.launch({ headless:false })

to show Chrome while Puppeteer is performing its operations. It can be nice to see what’s happening and debug.

在Puppeteer执行操作时显示Chrome。 很高兴看到正在发生的事情并进行调试。

We use await, and so we must wrap this method call in an async function, which we immediately invoke.

我们使用await ,因此必须将此方法调用包装在异步函数中 ,然后立即调用该函数 。

Next we can use the newPage() method on the browser object to get the page object:


(async () => {const browser = await puppeteer.launch()const page = await browser.newPage()

Next up we call the goto() method on the page object to load that page:


(async () => {const browser = await puppeteer.launch()const page = await browser.newPage()await page.goto('https://website.com')

We could use promises as well, instead of async/await, but using the latter makes things much more readable:

我们也可以使用promise,而不是async / await,但是使用后者可以使事情更具可读性:

(() => {puppeteer.launch().then(browser => {browser.newPage().then(page => {page.goto('https://website.com').then(() => {//...})})})

获取页面内容 (Getting the page content)

Once we have a page loaded with a URL, we can get the page content calling the evaluate() method of page:


(async () => {const browser = await puppeteer.launch()const page = await browser.newPage()await page.goto('https://website.com')const result = await page.evaluate(() => {//...})

This method takes a callback function, where we can add the code needed to retrieve the elements of the page we need. We return a new object, and this will be the result of our evaluate() method call.

该方法具有回调函数,我们可以在其中添加检索所需页面元素所需的代码。 我们返回一个新对象,这将是我们evaluate()方法调用的结果。

We can use the page.$() method to access the Selectors API method querySelector() on the document, and page.$$() as an alias to querySelectorAll().

我们可以使用page.$()方法访问文档上的Selectors API方法querySelector() ,并使用page.$$()作为querySelectorAll()的别名。

Once we are done with our calculations, we call the close() method on browser:



页面方法 (Page methods)

We saw above the page object we get from calling browser.newPage(), and we called the goto() and evaluate() methods on it.


All methods return a promise, so they are normally prepended with the await keyword.


Let’s see some of the most common methods we will call. You can see the full list on the Puppeteer docs.

让我们来看一些我们将要调用的最常见的方法。 您可以在Puppeteer文档中查看完整列表 。

page.$() (page.$())

Gives access to the Selectors API method querySelector() on the page

允许访问页面上的Selectors API方法querySelector()

page.$$() (page.$$())

Gives access to the Selectors API method querySelectorAll() on the page

允许访问页面上的Selectors API方法querySelectorAll()

page.$eval() (page.$eval())

Accepts 2 or more parameters. The first is a selector, the second a function. If there are more parameters, those are passed as additional arguments to the function.

接受2个或更多参数。 第一个是选择器,第二个是函数。 如果有更多参数,这些参数将作为附加参数传递给函数。

It runs querySelectorAll() on the page, using the first parameter as selector, then it uses that parameter as the first argument to the function.

它使用第一个参数作为选择器在页面上运行querySelectorAll() ,然后使用该参数作为函数的第一个参数。

const innerTextOfButton = await page.$eval('button#submit', el => el.innerText)

click() (click())

Perform a mouse click event on the element passed as parameter


await page.click('button#submit')

We can pass an additional argument with an object of options:


  • button can be set to left (default), right or middle

    button可以设置为left (默认), rightmiddle

  • clickCount is a number that defaults to 1 and sets how many times the element should be clicked


  • delay is the number of milliseconds between the clicks. Default is 0

    delay是两次点击之间的毫秒数。 默认为0

content() (content())

Get the HTML source of a page


const source = await page.content()

emulate() (emulate())

Emulates a device. It sets the user agent to a specific device, and sets the viewport accordingly.

模拟设备。 它将用户代理设置为特定设备,并相应地设置视口。

The list of devices supported is available in this file.


Here’s how you emulate an iPhone X:

这是模拟iPhone X的方法:

iPhone X

iPhone X

const puppeteer = require('puppeteer');
const device = require('puppeteer/DeviceDescriptors')['iPhone X'];puppeteer.launch().then(async browser => {const page = await browser.newPage()await page.emulate(device)//do stuffawait browser.close()

evaluate() (evaluate())

Evaluates a function in the page context. Inside this function we have access to the document object, so we can call any DOM API:

在页面上下文中评估函数。 在此函数内,我们可以访问document对象,因此我们可以调用任何DOM API:

const puppeteer = require('puppeteer');(async () => {const browser = await puppeteer.launch()const page = await browser.newPage()await page.goto('https://flaviocopes.com')const result = await page.evaluate(() => {return document.querySelectorAll('.footer-tags a').length})console.log(result)

Anything we call in here is executed in the page context, so if we run console.log(), we won’t see the result in the Node.js context because that’s executed in the headless browser.

我们在此处调用的任何内容都在页面上下文中执行,因此,如果运行console.log() ,则在Node.js上下文中将看不到结果,因为该结果是在无头浏览器中执行的。

We can calculate values here and return a JavaScript object, but if we want to return a DOM element and access it in the Node.js context, we must use a different method, evaluateHandle(). If we return a DOM element from evaluate(), we’ll just get an empty object.

我们可以在此处计算值并返回一个JavaScript对象,但是如果要返回DOM元素并在Node.js上下文中访问它,则必须使用其他方法evaluateHandle() 。 如果我们从Evaluation()返回一个DOM元素,我们将得到一个空对象。

evaluateHandle() (evaluateHandle())

Similar to evaluate(), but if we return a DOM element, we’ll get the proper object back rather than an empty object:


const puppeteer = require('puppeteer');(async () => {const browser = await puppeteer.launch()const page = await browser.newPage()await page.goto('https://flaviocopes.com')const result = await page.evaluateHandle(() => {return document.querySelectorAll('.footer-tags a')})console.log(result)

exposeFunction() (exposeFunction())

This method allows you to add a new function in the browser context, that is executed in the Node.js context.


This means we can add a function that runs Node.js code inside the browser.


This example adds a test() function inside the browser context that reads an “app.js” file from the file system, with the path relative to the script:

本示例在浏览器上下文中添加了一个test()函数,该函数从文件系统中读取“ app.js”文件,并提供相对于脚本的路径:

const puppeteer = require('puppeteer');
const fs = require('fs');(async () => {const browser = await puppeteer.launch()const page = await browser.newPage()await page.goto('https://flaviocopes.com')await page.exposeFunction('test', () => {const loadData = (path) => {try {return fs.readFileSync(path, 'utf8')} catch (err) {console.error(err)return false}}return loadData('app.js')})const result =  await page.evaluate(() => {return test()})console.log(result)

focus() (focus())

Focuses on the selector passed as parameter


await page.focus('input#name')

goBack() (goBack())

Goes back in the page navigation history


await page.goBack()

goForward() (goForward())

Goes forward in the page navigation history


await page.goForward()

goto() (goto())

Opens a new page.


await page.goto('https://flaviocopes.com')

You can pass an object as a second parameter, with options. The waitUntil option, if passed the networkidle2 value will wait until the navigation is complete:

您可以将带有选项的对象作为第二个参数传递。 如果通过了waitUntil选项,则传递networkidle2值将一直等到导航完成:

await page.goto('https://flaviocopes.com', {waitUntil: 'networkidle2'})

hover() (hover())

Do a mouseover on the selector passed as parameter


await page.hover('input#name')

pdf() (pdf())

Generate a PDF from a page. You can

从页面生成PDF。 您可以

await page.pdf({ path: 'file.pdf })

You can pass many options to this method, to set the generated PDF details. See the official docs.

您可以将许多选项传递给此方法,以设置生成的PDF详细信息。 参见官方文档 。

reload() (reload())

Reload a page


await page.reload()

screenshot() (screenshot())

Takes a PNG screenshot of the page, saving it to the filename selected using path.


await page.screenshot({path: 'screenshot.png'})

See all the options


select() (select())

Select the DOM elements identified by the selector passed as parameter


await page.select('input#name')

setContent() (setContent())

You can set the content of a page, rather than opening an existing web page.


Useful to programmatically generate PDFs or screenshots with existing HTML:


const html = '<h1>Hello!</h1>'
await page.setContent(html)
await page.pdf({path: 'hello.pdf'})
await page.screenshot({path: 'screenshot.png'})

setViewPort() (setViewPort())

By default the viewport is 800x600px. If you want to have a different viewport, maybe to take a screenshot, call setViewport passing an object with width and height properties.

默认情况下,视口为800x600px。 如果要使用其他视口,也许要截屏,请调用setViewport传递带有widthheight属性的对象。

await page.setViewport({ width: 1280, height: 800 })

title() (title())

Get the page title


await page.title()

type() (type())

Types into a selector that identifies a form element


await page.type('input#name', 'Flavio')

The delay option allows to simulate typing like a real world user, adding delay between each character:


await page.type('input#name', 'Flavio', {delay: 100})

url() (url())

Get the page URL


await page.url()

viewport() (viewport())

Get the page viewport


await page.viewport()

waitFor() (waitFor())

Wait for something specific to happen. Has the following shortcut functions:

等待特定的事情发生。 具有以下快捷功能:

  • waitForFunction


  • waitForNavigation


  • waitForRequest


  • waitForResponse


  • waitForSelector


  • waitForXPath




await page.waitFor(waitForNameToBeFilled)
const waitForNameToBeFilled = () => page.$('input#name').value != ''

页面名称空间 (Page namespaces)

A page object gives you access to several different objects:


  • accessibility


  • coverage


  • keyboard


  • mouse


  • touchscreen


  • tracing


Each of those unlocks a whole lot of new functionality.


keyboard and mouse are most probably the ones you’ll use the most when trying to automate things.


For example this is how you trigger typing into an element (which should have been selected previously):


await page.keyboard.type('hello!')

Other keyboard methods are


  • keyboard.down() to send a keydown event


  • keyboard.press() to send a keydown followed by a keyup (simulating a normal key type). Used mainly for modifier keys (shift, ctrl, cmd)

    keyboard.press()发送一个keydown,然后发送一个keyup(模拟正常的按键类型)。 主要用于修饰键(shift,ctrl,cmd)

  • keyboard.sendCharacter() sends a keypress event


  • keyboard.type() sends a keydown, keypress and keyup event


  • keyboard.up() to send a keyup event


All those receive a keyboard key code as defined in the US Keyboard Layout file: https://github.com/GoogleChrome/puppeteer/blob/master/lib/USKeyboardLayout.js. Normal characters and numbers are typed as-is, while special keys have a special code to define them.

所有这些都将收到美国键盘布局文件中定义的键盘键代码: https : //github.com/GoogleChrome/puppeteer/blob/master/lib/USKeyboardLayout.js 。 普通字符和数字按原样键入,而特殊键具有定义它们的特殊代码。

mouse offers 4 methods:


  • mouse.click() to simulate a click: mousedown and mouseup events

    mouse.click()模拟点击: mousedownmouseup事件

  • mouse.down() to simulate a mousedown event


  • mouse.move() to move to different coordinates


  • mouse.up() to simulate a mouseup event


翻译自: https://flaviocopes.com/puppeteer/



