智能组件和木偶组件

Puppeteer is a Node library that we can use to control a headless Chrome instance. We are basically using Chrome, but programmatically using JavaScript.

Puppeteer是一个Node库，我们可以使用它来控制无头Chrome实例。我们基本上是使用Chrome，但是以编程方式使用JavaScript。

Using it, we can:

使用它，我们可以：

scrape web pages

抓取网页
automate form submissions自动化表单提交
perform any kind of browser automation执行任何类型的浏览器自动化
track page loading performance跟踪页面加载效果
create server-side rendered versions of single page apps创建单页应用程序的服务器端呈现版本
make screenshots制作屏幕截图
create automating testing创建自动化测试
generate PDF from web pages从网页生成PDF

It’s built by Google. It does not unlock anything new, per se, but it abstracts many of the nitty-gritty details we would have to deal with, without using it.

它是由Google构建的。它本身并没有解锁任何新内容，但是它抽象了许多我们必须处理的细节，而无需使用它们。

In short, it makes things very easy.

简而言之，它使事情变得很容易。

Since it spins up a new Chrome instance when it’s initialized, it might not be the most performant. It’s the most precise way to automate testing with Chrome though, since it’s using the actual browser under the hood.

由于初始化时会启动一个新的Chrome实例，因此它可能不是性能最高的。不过，这是使用Chrome自动进行测试的最精确方法，因为它使用的是实际的内部浏览器 。

To be precise, it uses Chromium the open source part of Chrome, which mostly means you don’t have the proprietary codecs that are licensed by Google and can’t be open sourced (MP3, AAC, H.264..) and you don’t have the integration with Google services like crash reports, Google update and more, but from a programmatic standpoint it should all be 100% similar to Chrome (except for media playing, as noted).

确切地说，它使用Chromium(Chrome的开源部分)，这主要意味着您没有Google许可的专有编解码器，并且无法开源(MP3，AAC，H.264 ..)，因此您没有与崩溃报告，Google更新等Google服务集成，但从编程角度来看，它们应该与Chrome 100％相似(媒体播放除外，如前所述)。

安装木偶 (Installing Puppeteer)

Start by installing it using

首先使用安装

npm install puppeteer

in your project.

在您的项目中。

This will download and bundle the latest version of Chromium.

这将下载并捆绑最新版本的Chromium。

You can opt to make puppeteer run the local installation of Chrome you already have installed by installing puppeteer-core instead, which is useful in some special cases (see puppeteer vs puppeteer-core). Usually, you’d just go with puppeteer.

您可以通过安装puppeteer-core来选择让puppeteer运行已经安装的Chrome的本地安装，这在某些特殊情况下非常有用(请参阅puppeteer与puppeteer-core )。通常，您只会和puppeteer一起去。

使用木偶 (Using Puppeteer)

In a Node.js file, require it:

在Node.js文件中，要求它：

const puppeteer = require('puppeteer');

then we can use the launch() method to create a browser instance:

然后我们可以使用launch()方法创建一个浏览器实例：

(async () => {const browser = await puppeteer.launch()
})()

We can write like this, too:

我们也可以这样写：

puppeteer.launch().then(async browser => {//...
})

You can pass an object with options to puppeteer.launch(). The most common one is

您可以将带有选项的对象传递给puppeteer.launch() 。最常见的是

puppeteer.launch({ headless:false })

to show Chrome while Puppeteer is performing its operations. It can be nice to see what’s happening and debug.

在Puppeteer执行操作时显示Chrome。很高兴看到正在发生的事情并进行调试。

We use await, and so we must wrap this method call in an async function, which we immediately invoke.

我们使用await ，因此必须将此方法调用包装在异步函数中，然后立即调用该函数。

Next we can use the newPage() method on the browser object to get the page object:

接下来，我们可以在browser对象上使用newPage()方法获取page对象：

(async () => {const browser = await puppeteer.launch()const page = await browser.newPage()
})()

Next up we call the goto() method on the page object to load that page:

接下来，我们在page对象上调用goto()方法以加载该页面：

(async () => {const browser = await puppeteer.launch()const page = await browser.newPage()await page.goto('https://website.com')
})()

We could use promises as well, instead of async/await, but using the latter makes things much more readable:

我们也可以使用promise，而不是async / await，但是使用后者可以使事情更具可读性：

(() => {puppeteer.launch().then(browser => {browser.newPage().then(page => {page.goto('https://website.com').then(() => {//...})})})
})()

获取页面内容 (Getting the page content)

Once we have a page loaded with a URL, we can get the page content calling the evaluate() method of page:

一旦我们有了装有一个URL的网页，我们可以得到的页面内容调用evaluate()的方法page ：

(async () => {const browser = await puppeteer.launch()const page = await browser.newPage()await page.goto('https://website.com')const result = await page.evaluate(() => {//...})
})()

This method takes a callback function, where we can add the code needed to retrieve the elements of the page we need. We return a new object, and this will be the result of our evaluate() method call.

该方法具有回调函数，我们可以在其中添加检索所需页面元素所需的代码。我们返回一个新对象，这将是我们evaluate()方法调用的结果。

We can use the page.$() method to access the Selectors API method querySelector() on the document, and page.$$() as an alias to querySelectorAll().

我们可以使用page.$()方法访问文档上的Selectors API方法querySelector() ，并使用page.$$()作为querySelectorAll()的别名。

Once we are done with our calculations, we call the close() method on browser:

一旦完成计算，就可以在browser上调用close()方法：

browser.close()

页面方法 (Page methods)

We saw above the page object we get from calling browser.newPage(), and we called the goto() and evaluate() methods on it.

我们在page对象上方看到了通过调用browser.newPage()获得的对象，并在其上调用了goto()和evaluate()方法。

All methods return a promise, so they are normally prepended with the await keyword.

所有方法都返回一个Promise，因此通常在它们前面加上await关键字。

Let’s see some of the most common methods we will call. You can see the full list on the Puppeteer docs.

让我们来看一些我们将要调用的最常见的方法。您可以在Puppeteer文档中查看完整列表。

`page.$()` (`page.$()`)

Gives access to the Selectors API method querySelector() on the page

允许访问页面上的Selectors API方法querySelector()

`page.$$()` (`page.$$()`)

Gives access to the Selectors API method querySelectorAll() on the page

允许访问页面上的Selectors API方法querySelectorAll()

`page.$eval()` (`page.$eval()`)

Accepts 2 or more parameters. The first is a selector, the second a function. If there are more parameters, those are passed as additional arguments to the function.

接受2个或更多参数。第一个是选择器，第二个是函数。如果有更多参数，这些参数将作为附加参数传递给函数。

It runs querySelectorAll() on the page, using the first parameter as selector, then it uses that parameter as the first argument to the function.

它使用第一个参数作为选择器在页面上运行querySelectorAll() ，然后使用该参数作为函数的第一个参数。

const innerTextOfButton = await page.$eval('button#submit', el => el.innerText)

`click()` (`click()`)

Perform a mouse click event on the element passed as parameter

在作为参数传递的元素上执行鼠标单击事件

await page.click('button#submit')

We can pass an additional argument with an object of options:

我们可以传递带有参数对象的附加参数：

button can be set to left (default), right or middle

button可以设置为left (默认)， right或middle
clickCount is a number that defaults to 1 and sets how many times the element should be clicked

clickCount是一个默认为1的数字，用于设置元素应被单击的次数。
delay is the number of milliseconds between the clicks. Default is 0

delay是两次点击之间的毫秒数。默认为0

`content()` (`content()`)

Get the HTML source of a page

获取页面HTML源

const source = await page.content()

`emulate()` (`emulate()`)

Emulates a device. It sets the user agent to a specific device, and sets the viewport accordingly.

模拟设备。它将用户代理设置为特定设备，并相应地设置视口。

The list of devices supported is available in this file.

此文件中提供了受支持的设备列表。

Here’s how you emulate an iPhone X:

这是模拟iPhone X的方法：

iPhone X

const puppeteer = require('puppeteer');
const device = require('puppeteer/DeviceDescriptors')['iPhone X'];puppeteer.launch().then(async browser => {const page = await browser.newPage()await page.emulate(device)//do stuffawait browser.close()
})

`evaluate()` (`evaluate()`)

Evaluates a function in the page context. Inside this function we have access to the document object, so we can call any DOM API:

在页面上下文中评估函数。在此函数内，我们可以访问document对象，因此我们可以调用任何DOM API：

const puppeteer = require('puppeteer');(async () => {const browser = await puppeteer.launch()const page = await browser.newPage()await page.goto('https://flaviocopes.com')const result = await page.evaluate(() => {return document.querySelectorAll('.footer-tags a').length})console.log(result)
})()

Anything we call in here is executed in the page context, so if we run console.log(), we won’t see the result in the Node.js context because that’s executed in the headless browser.

我们在此处调用的任何内容都在页面上下文中执行，因此，如果运行console.log() ，则在Node.js上下文中将看不到结果，因为该结果是在无头浏览器中执行的。

We can calculate values here and return a JavaScript object, but if we want to return a DOM element and access it in the Node.js context, we must use a different method, evaluateHandle(). If we return a DOM element from evaluate(), we’ll just get an empty object.

我们可以在此处计算值并返回一个JavaScript对象，但是如果要返回DOM元素并在Node.js上下文中访问它，则必须使用其他方法evaluateHandle() 。如果我们从Evaluation()返回一个DOM元素，我们将得到一个空对象。

`evaluateHandle()` (`evaluateHandle()`)

Similar to evaluate(), but if we return a DOM element, we’ll get the proper object back rather than an empty object:

与valuate()类似，但是如果返回一个DOM元素，我们将获得正确的对象，而不是空对象：

const puppeteer = require('puppeteer');(async () => {const browser = await puppeteer.launch()const page = await browser.newPage()await page.goto('https://flaviocopes.com')const result = await page.evaluateHandle(() => {return document.querySelectorAll('.footer-tags a')})console.log(result)
})()

`exposeFunction()` (`exposeFunction()`)

This method allows you to add a new function in the browser context, that is executed in the Node.js context.

此方法使您可以在浏览器上下文中添加新功能，该功能在Node.js上下文中执行。

This means we can add a function that runs Node.js code inside the browser.

这意味着我们可以添加一个在浏览器内部运行Node.js代码的函数。

This example adds a test() function inside the browser context that reads an “app.js” file from the file system, with the path relative to the script:

本示例在浏览器上下文中添加了一个test()函数，该函数从文件系统中读取“ app.js”文件，并提供相对于脚本的路径：

const puppeteer = require('puppeteer');
const fs = require('fs');(async () => {const browser = await puppeteer.launch()const page = await browser.newPage()await page.goto('https://flaviocopes.com')await page.exposeFunction('test', () => {const loadData = (path) => {try {return fs.readFileSync(path, 'utf8')} catch (err) {console.error(err)return false}}return loadData('app.js')})const result =  await page.evaluate(() => {return test()})console.log(result)
})()

`focus()` (`focus()`)

Focuses on the selector passed as parameter

重点介绍作为参数传递的选择器

await page.focus('input#name')

`goBack()` (`goBack()`)

Goes back in the page navigation history

返回页面导航历史记录

await page.goBack()

`goForward()` (`goForward()`)

Goes forward in the page navigation history

在页面导航历史记录中前进

await page.goForward()

`goto()` (`goto()`)

Opens a new page.

打开一个新页面。

await page.goto('https://flaviocopes.com')

You can pass an object as a second parameter, with options. The waitUntil option, if passed the networkidle2 value will wait until the navigation is complete:

您可以将带有选项的对象作为第二个参数传递。如果通过了waitUntil选项，则传递networkidle2值将一直等到导航完成：

await page.goto('https://flaviocopes.com', {waitUntil: 'networkidle2'})

`hover()` (`hover()`)

Do a mouseover on the selector passed as parameter

将鼠标悬停在作为参数传递的选择器上

await page.hover('input#name')

`pdf()` (`pdf()`)

Generate a PDF from a page. You can

从页面生成PDF。您可以

await page.pdf({ path: 'file.pdf })

You can pass many options to this method, to set the generated PDF details. See the official docs.

您可以将许多选项传递给此方法，以设置生成的PDF详细信息。参见官方文档。

`reload()` (`reload()`)

Reload a page

重新载入页面

await page.reload()

`screenshot()` (`screenshot()`)

Takes a PNG screenshot of the page, saving it to the filename selected using path.

拍摄页面的PNG截图，并将其保存到使用path选择的文件名中。

await page.screenshot({path: 'screenshot.png'})

See all the options

查看所有选项

`select()` (`select()`)

Select the DOM elements identified by the selector passed as parameter

选择由作为参数传递的选择器标识的DOM元素

await page.select('input#name')

`setContent()` (`setContent()`)

You can set the content of a page, rather than opening an existing web page.

您可以设置页面的内容，而不是打开现有的网页。

Useful to programmatically generate PDFs or screenshots with existing HTML:

有助于以编程方式使用现有HTML生成PDF或屏幕截图：

const html = '<h1>Hello!</h1>'
await page.setContent(html)
await page.pdf({path: 'hello.pdf'})
await page.screenshot({path: 'screenshot.png'})

`setViewPort()` (`setViewPort()`)

By default the viewport is 800x600px. If you want to have a different viewport, maybe to take a screenshot, call setViewport passing an object with width and height properties.

默认情况下，视口为800x600px。如果要使用其他视口，也许要截屏，请调用setViewport传递带有width和height属性的对象。

await page.setViewport({ width: 1280, height: 800 })

`title()` (`title()`)

Get the page title

获取页面标题

await page.title()

`type()` (`type()`)

Types into a selector that identifies a form element

输入标识表单元素的选择器

await page.type('input#name', 'Flavio')

The delay option allows to simulate typing like a real world user, adding delay between each character:

delay选项允许像真实世界的用户一样模拟打字，在每个字符之间增加延迟：

await page.type('input#name', 'Flavio', {delay: 100})

`url()` (`url()`)

Get the page URL

获取页面URL

await page.url()

`viewport()` (`viewport()`)

Get the page viewport

获取页面视口

await page.viewport()

`waitFor()` (`waitFor()`)

Wait for something specific to happen. Has the following shortcut functions:

等待特定的事情发生。具有以下快捷功能：

waitForFunction

waitForFunction
waitForNavigation

waitForNavigation
waitForRequest

waitForRequest
waitForResponse

waitForResponse
waitForSelector

waitForSelector
waitForXPath

waitForXPath

Example:

例：

await page.waitFor(waitForNameToBeFilled)
const waitForNameToBeFilled = () => page.$('input#name').value != ''

页面名称空间 (Page namespaces)

A page object gives you access to several different objects:

页面对象使您可以访问几个不同的对象：

accessibility

accessibility
coverage

coverage
keyboard

keyboard
mouse

mouse
touchscreen

touchscreen
tracing

tracing

Each of those unlocks a whole lot of new functionality.

这些功能中的每一个都解锁了很多新功能。

keyboard and mouse are most probably the ones you’ll use the most when trying to automate things.

keyboard和mouse很可能是您在尝试自动化时最常使用的工具。

For example this is how you trigger typing into an element (which should have been selected previously):

例如，这是触发输入元素的方式(之前应选择该元素)：

await page.keyboard.type('hello!')

Other keyboard methods are

其他键盘方法是

keyboard.down() to send a keydown event

keyboard.down()发送一个keydown事件
keyboard.press() to send a keydown followed by a keyup (simulating a normal key type). Used mainly for modifier keys (shift, ctrl, cmd)

keyboard.press()发送一个keydown，然后发送一个keyup(模拟正常的按键类型)。主要用于修饰键(shift，ctrl，cmd)
keyboard.sendCharacter() sends a keypress event

keyboard.sendCharacter()发送一个按键事件
keyboard.type() sends a keydown, keypress and keyup event

keyboard.type()发送一个keydown，keypress和keyup事件
keyboard.up() to send a keyup event

keyboard.up()发送一个keyup事件

All those receive a keyboard key code as defined in the US Keyboard Layout file: https://github.com/GoogleChrome/puppeteer/blob/master/lib/USKeyboardLayout.js. Normal characters and numbers are typed as-is, while special keys have a special code to define them.

所有这些都将收到美国键盘布局文件中定义的键盘键代码： https : //github.com/GoogleChrome/puppeteer/blob/master/lib/USKeyboardLayout.js 。普通字符和数字按原样键入，而特殊键具有定义它们的特殊代码。

mouse offers 4 methods:

mouse提供4种方法：

mouse.click() to simulate a click: mousedown and mouseup events

mouse.click()模拟点击： mousedown和mouseup事件
mouse.down() to simulate a mousedown event

mouse.down()模拟mousedown事件
mouse.move() to move to different coordinates

mouse.move()移动到不同的坐标
mouse.up() to simulate a mouseup event

mouse.up()模拟mouseup事件

翻译自: https://flaviocopes.com/puppeteer/

智能组件和木偶组件