In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-07 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
Editor to share with you about Puppeteer, I believe that most people do not know much about it, so share this article for your reference, I hope you will learn a lot after reading this article, let's go to know it!
The background of the appearance
After Chrome59 (linux, macos) and Chrome60 (windows), Chrome comes with headless (no interface) mode, which makes it easy to do automated testing or crawlers. But how to interact with Chrome in headless mode is a problem. Only a simple initialization operation at startup can be achieved through the command line arguments when starting Chrome. Selenium, Webdriver, etc., are a solution, but often rely on many, not flat enough.
Puppeteer is an official Node library from Google that controls headless Chrome through the DevTools protocol. You can use the api provided by Puppeteer to directly control Chrome to simulate most user operations for UI Test or to access pages as crawlers to collect data.
Environment and installation
Puppeteer itself relies on more than 6. 4 Node, but for asynchronous super-useful async/await, it is recommended to use Node version 7. 6 or later. In addition, headless Chrome itself has relatively high requirements for the version of the library that the server depends on, the dependence of the centos server is stable, and it is difficult for V6 to use headless Chrome. Upgrading the dependent version may lead to a variety of server problems (including and not limited to the inability to use ssh), it is best to use a high version of the server.
Puppeteer is easy to install because it is a npm package:
Npm i puppeteer
Or
Yarn add puppeteer
Puppeteer installs with the latest version of Chromium, which can be skipped by setting environment variables or PUPPETEER_SKIP_CHROMIUM_DOWNLOAD in npm config. If you do not download it, you can specify the location of the Chromium at startup through the executablePath in the puppeteer.launch ([options]) configuration item.
Use and examples
Puppeteer is similar to other frameworks by manipulating Browser instances to manipulate browsers to respond accordingly.
Const puppeteer = require ('puppeteer'); (async () = > {const browser = await puppeteer.launch (); const page = await browser.newPage (); await page.goto (' http://rennaiqian.com'); await page.screenshot ({path: 'example.png'}); await page.pdf ({path:' example.pdf', format: 'A4'}); await browser.close ();}) ()
The above code generates an instance of browser through the launch method of puppeteer. Corresponding to the browser, the launch method can pass in configuration items, and it is useful to turn off headless mode by passing in {headless:false} when debugging locally.
Const browser = await puppeteer.launch ({headless:false})
The browser.newPage method opens a new tab and returns the instance page of the tab, and various methods on the page allow you to perform common operations on the page. The above code takes a screenshot and prints the pdf.
A very powerful method is page.evaluate (pageFunction,... args), which can inject our functions into the page, so there are unlimited possibilities.
Const puppeteer = require ('puppeteer'); (async () = > {const browser = await puppeteer.launch (); const page = await browser.newPage (); await page.goto (' http://rennaiqian.com'); / / Get the "viewport" of the page, as reported by the page. Const dimensions = await page.evaluate () = > {return {width: document.documentElement.clientWidth, height: document.documentElement.clientHeight, deviceScaleFactor: window.devicePixelRatio};}); console.log ('Dimensions:', dimensions); await browser.close ();}) ()
It should be noted that external variables cannot be directly used in the evaluate method, and need to be passed as parameters, and return is also needed to get the results of the execution. Because it is an open source project for more than a month, now the project is very active, so when you use it, you can find your own api to ensure that the parameters and usage will not be wrong.
Debugging skills
1. Turn off the interface-free mode and sometimes it is useful to view what the browser displays. Use the following command to launch the full browser:
Const browser = await puppeteer.launch ({headless: false})
two。 To slow down, the slowMo option slows down the operation of Puppeteer in the specified milliseconds. This is another way to see what happened:
Const browser = await puppeteer.launch ({headless:false, slowMo:250})
3. Capture the output of console by listening for console events. This is also convenient when debugging code in page.evaluate:
Page.on ('console', msg = > console.log (' PAGE LOG:',.. msg.args)); await page.evaluate (() = > console.log (`url is ${location.href} `))
4. Start detailed logging, and all public API calls and internal protocol traffic will be recorded through the debug module under the puppeteer namespace
# Basic verbose logging env DEBUG= "puppeteer:*" node script.js # Debug output can be enabled/disabled by namespace env DEBUG= "puppeteer:*,-puppeteer:protocol" node script.js # everything BUT protocol messages env DEBUG= "puppeteer:session" node script.js # protocol session messages (protocol messages to targets) env DEBUG= "puppeteer:mouse,puppeteer:keyboard" node script.js # only Mouse and Keyboard API calls # Protocol traffic can be rather noisy. This example filters out all Network domain messages env DEBUG= "puppeteer:*" env DEBUG_COLORS=true node script.js 2 > & 1 | grep-v'"Network'
Crawler practice
Many web pages use user-agent to determine the device, and page.emulate (options) can be used to simulate. Options has two configuration items, one is userAgent, and the other is viewport, which can set width (width), height (height), screen zoom (deviceScaleFactor), whether it is mobile (isMobile), and whether there are touch events (hasTouch).
Const puppeteer = require ('puppeteer'); const devices = require (' puppeteer/DeviceDescriptors'); const iPhone = devices ['iPhone 6']; puppeteer.launch (). Then (async browser = > {const page = await browser.newPage (); await page.emulate (iPhone); await page.goto ('https://www.example.com'); / / other actions...) Await browser.close ();}
The above code simulates iPhone6 visiting a website, where devices is a simulation parameter for some common devices built into puppeteer.
Many web pages need to log in, and there are two solutions:
1. Ask puppeteer to enter the account password
Common methods: click to use the page.click (selector [, options]) method, or you can choose to focus on page.focus (selector).
Input can use page.type (selector, text [, options]) to enter a specified string, and you can set delay slow input in options to be more like a real person. You can also use keyboard.down (key [, options]) to input one character at a time.
2. If the login status is judged by cookie, you can use page.setCookie (. Cookies). If you want to maintain cookie, you can access it regularly.
Tip: some websites need to scan the code, but other pages with the same domain name are logged in, so you can try to log in to the page where you can log in and skip code scanning with cookie access.
Simple example
Const puppeteer = require ('puppeteer'); (async () = > {const browser = await puppeteer.launch ({headless: false}); const page = await browser.newPage (); await page.goto (' https://baidu.com'); await page.type ('# kw', 'puppeteer', {delay: 100}); page.click (' # su') await page.waitFor (1000) Const targetLink = await page.evaluate (() = > {return [. Document.querySelectorAll ('.result a')] .filter (item = > {return item.innerText & & item.innerText.includes (' introduction and practice of Puppeteer')}). ToString ()}); await page.goto (targetLink); await page.waitFor (1000); browser.close (); () () above is all the content of the article "Puppeteer", thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.