What is the use of web automated testing tool Pyppeteer 07/01 Update SLTechnology News&Howtos

What is the use of web automated testing tool Pyppeteer

2025-07-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Shulou(Shulou.com)06/02 Report--

Editor to share with you what is the use of web automated testing tool Pyppeteer, I believe most people do not know much about it, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

Introduction to 01.Pyppeteer

Before introducing Pyppeteer, I would like to say that Puppeteer,Puppeteer is a tool developed by Google based on Node.js, which is mainly used to manipulate API of Chrome browser, manipulate Chrome browser through Javascript code, and complete tasks such as data crawling and automatic testing of Web programs.

Pyppeteer is actually the Python version of Puppeteer. Here's a brief introduction to the two major features of Pyppeteer, chromium browser and asyncio framework:

1) .chromium

Chromium is an independent browser, which is Google's plan to develop its own browser, Google Chrome, which is equivalent to the experimental version of Chrome. Chromium is not as stable as Chrome, but it has more features and updates quickly. Usually, a new development version is released every few hours.

Pyppeteer's web automation is based on chromium, and due to some features in chromium, the installation and configuration of Pyppeteer is very simple, which we will discuss in more detail later.

2). Asyncio

Asyncio is an asynchronous protocol library of Python. The standard library introduced since version 3.4 has built-in support for asynchronous IO. It is claimed to be the most ambitious library of Python. It is described in detail on the official website:

02. Installation and use

1)。 Minimalist installation

The installation of the pyppeteer library can be completed using the pip install pyppeteer command, and as for the chromium browser, only a pyppeteer-install command is needed to automatically download the latest version of the chromium browser to the default location of pyppeteer.

If you do not run the pyppeteer-install command, the chromium browser will be downloaded and installed automatically the first time you use pyppeteer, and the effect will be the same. In general, pyppeteer eliminates the need for driver configuration compared to selenium.

Of course, for some reason, there may be situations where the automatic installation of chromium cannot be completed successfully, so consider installing it manually: first, find the corresponding version of your system from the following URL and download the chromium package.

'linux': 'https://storage.googleapis.com/chromium-browser-snapshots/Linux_x64/575458/chrome-linux.zip''mac':' https://storage.googleapis.com/chromium-browser-snapshots/Mac/575458/chrome-mac.zip''win32': 'https://storage.googleapis.com/chromium-browser-snapshots/Win/575458/chrome-win32.zip''win64':' https://storage.googleapis.com/chromium-browser-snapshots/Win_x64/575458/chrome-win32.zip'

(swipe left and right to view)

Then, unzip the package in the specified directory of pyppeteer, the default directory of the windows system. The default directories under other systems can refer to the following figure:

2)。 Use

Try the effect after installation. Let's take a look at the following code. In the main function, first create a browser object, then open a new tab, visit the Baidu home page, take a screenshot of the current page and save it as "example.png", and finally close the browser. As mentioned earlier, pyppeteer is built on asyncio, so you need to use the async/await structure when using it.

When you run the above code, you will find that there is no browser pop-up running. This is because Pyppeteer uses a headless browser by default. If you want the browser to display, you need to set the parameter "headless = False" in the launch function. After running the program, the captured web page image will appear in the same directory:

03. Actual combat asynchronous fund crawling

We have been saying that Pyppeteer is a very efficient web automated testing tool, the essence of which is that Pyppeteer is built on asyncio, and almost all its properties and methods are coroutine objects, so it is very convenient to build asynchronous programs and naturally supports asynchronous running.

Let's compare the efficiency of sequential execution and asynchronous execution:

1)。 Fund withdrawal

We take the open-end fund net data crawling in Tiantian Fund Network as this experimental task. The following chart is the historical net value data of a fund. This page is loaded by js, and there is no way to obtain content information directly through requests. Therefore, you can consider using simulated browser operation for data crawling. (as a matter of fact, the acquisition of fund net value data has an API interface, and this task is only for demonstration and has no practical value.)

In order to make the effect more obvious, we crawled the net worth data of nearly 20 trading days of the top 50 funds on the fund list page (below).

2)。 Sequential execution

The basic idea of the program construction is to build a new browser browser and a page page, visit the net value data page of each fund in turn and crawl the data. The core code is as follows:

The get_data () function in the code is used for net value data page parsing and data conversion, and the get_all_codes () function is used to obtain the fund codes of all open-end funds (a total of more than 6000). Although the program also uses the structure of async/await, the acquisition of net worth data for multiple funds is performed sequentially in the callurl_and_getdata () function. This is written because the methods in pyppeteer are coroutine objects, and the program must be built in this form.

In order to eliminate the time-consuming interference of opening the browser, we only count the time it takes to visit the page and data crawl, and the result is 12.08 seconds.

3)。 Asynchronous execution

Next, we will transform the program, the functional functions are unchanged, mainly to convert the loop operation of fundlist into async task object. The core code is as follows:

The time-consuming statistical interval is still calculated after the browser is opened, and its running time is 2.18 seconds, which is six times faster than sequential execution. As you can imagine, if you need to crawl a large amount of work, sequential execution takes 10 hours, asynchronous execution may take less than 2 hours, the optimization effect is very obvious.

The above is all the content of the article "what is the use of web Automated testing tool Pyppeteer". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.