Puppeteer 集成教程
Puppeteer 是 Google 开发的 Node.js 库,用于控制 Chrome/Chromium 浏览器。
安装
bash
npm install puppeteer
# 或使用 puppeteer-core (不自动下载 Chromium)
npm install puppeteer-core1
2
3
2
3
基本使用
HTTP 代理
javascript
const puppeteer = require('puppeteer');
(async () => {
const browser = await puppeteer.launch({
args: [
'--proxy-server=proxy.okkproxy.com:8080'
]
});
const page = await browser.newPage();
// 设置代理认证
await page.authenticate({
username: 'your_username',
password: 'your_password'
});
await page.goto('https://httpbin.org/ip');
const content = await page.content();
console.log(content);
await browser.close();
})();1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
SOCKS5 代理
javascript
const browser = await puppeteer.launch({
args: [
'--proxy-server=socks5://proxy.okkproxy.com:1080'
]
});1
2
3
4
5
2
3
4
5
高级配置
设置 User-Agent
javascript
const page = await browser.newPage();
await page.setUserAgent('Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36');
await page.authenticate({
username: 'your_username',
password: 'your_password'
});1
2
3
4
5
6
2
3
4
5
6
设置请求头
javascript
await page.setExtraHTTPHeaders({
'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8',
'Accept': 'text/html,application/xhtml+xml'
});1
2
3
4
2
3
4
禁用图片加载
javascript
await page.setRequestInterception(true);
page.on('request', (req) => {
if (req.resourceType() === 'image') {
req.abort();
} else {
req.continue();
}
});1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
代理轮换
javascript
const proxies = [
{ server: 'proxy1.okkproxy.com:8080', user: 'user1', pass: 'pass1' },
{ server: 'proxy2.okkproxy.com:8080', user: 'user2', pass: 'pass2' },
];
async function createBrowserWithProxy(proxyConfig) {
const browser = await puppeteer.launch({
args: [`--proxy-server=${proxyConfig.server}`]
});
const page = await browser.newPage();
await page.authenticate({
username: proxyConfig.user,
password: proxyConfig.pass
});
return { browser, page };
}
// 使用
const { browser, page } = await createBrowserWithProxy(proxies[0]);1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
错误处理
javascript
async function fetchWithRetry(url, maxRetries = 3) {
for (let i = 0; i < maxRetries; i++) {
try {
const browser = await puppeteer.launch({
args: ['--proxy-server=proxy.okkproxy.com:8080']
});
const page = await browser.newPage();
await page.authenticate({
username: 'your_username',
password: 'your_password'
});
await page.goto(url, { timeout: 30000 });
const content = await page.content();
await browser.close();
return content;
} catch (error) {
console.log(`尝试 ${i + 1} 失败:`, error.message);
if (i === maxRetries - 1) throw error;
}
}
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
性能优化
无头模式
javascript
const browser = await puppeteer.launch({
headless: true, // 无头模式,更快
args: [
'--proxy-server=proxy.okkproxy.com:8080',
'--disable-gpu',
'--no-sandbox'
]
});1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
禁用不必要的功能
javascript
const browser = await puppeteer.launch({
args: [
'--proxy-server=proxy.okkproxy.com:8080',
'--disable-dev-shm-usage',
'--disable-setuid-sandbox',
'--no-zygote',
'--disable-accelerated-2d-canvas',
'--disable-web-security'
]
});1
2
3
4
5
6
7
8
9
10
2
3
4
5
6
7
8
9
10
实用示例
截图
javascript
const page = await browser.newPage();
await page.authenticate({
username: 'your_username',
password: 'your_password'
});
await page.goto('https://example.com');
await page.screenshot({ path: 'screenshot.png', fullPage: true });1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
生成 PDF
javascript
await page.goto('https://example.com', { waitUntil: 'networkidle0' });
await page.pdf({ path: 'page.pdf', format: 'A4' });1
2
2
执行 JavaScript
javascript
const title = await page.evaluate(() => document.title);
console.log('页面标题:', title);1
2
2
TypeScript 支持
typescript
import puppeteer, { Browser, Page } from 'puppeteer';
interface ProxyConfig {
server: string;
username: string;
password: string;
}
async function launchWithProxy(config: ProxyConfig): Promise<{browser: Browser, page: Page}> {
const browser = await puppeteer.launch({
args: [`--proxy-server=${config.server}`]
});
const page = await browser.newPage();
await page.authenticate({
username: config.username,
password: config.password
});
return { browser, page };
}1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21