Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python web crawler to get the download link of movie paradise video

2025-03-26 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly introduces "how to use the Python web crawler to get the movie paradise video download link". In the daily operation, I believe many people have doubts about how to use the Python web crawler to obtain the movie paradise video download link. The editor consulted all kinds of materials and sorted out simple and easy-to-use operation methods. I hope it will be helpful for you to answer the question of "how to use Python web crawler to get the movie paradise video download link"! Next, please follow the editor to study!

[1. Project background]

I believe everyone has a headache experience, it is very difficult to download movies, right? I want to download one movie after another, and I can't intuitively know the status of the latest movie updates.

Today, the editor takes the movie paradise as an example to take you to see your favorite movies more intuitively and download them.

[II. Project preparation]

First of all, we need to install a Pycharm software in the first step. For Pycharm software installation, see this tutorial: Python Environment Building-detailed tutorials on Python and Pycharm installation for Amway Python rookies.

The website of Movie Paradise:

Https://www.ygdy8.net/html/gndy/dyzz/list_23_1.html

We need to download several libraries. How to download them? First open Pycharm, click File, and then click setting.

When it opens, this interface will appear and click on your project name (project: (your project name)) project interpreter and click the plus sign to download the library we need (requests,requests,time,re module), as shown in the following figure.

If you can't load the interpreter, you can refer to this hands-on tutorial: a simple tutorial on how to configure the Python interpreter after installing Pycharm.

If the corresponding library is still missing, you can download and install it as follows.

[III. Project implementation]

We need (requests,requests,time,re module), as shown in the following figure.

Use the encapsulation method to realize each part of the function. The first step is to write a framework: construct a class FilmSky, then define self in a-init method, and then define a main method (main). Finally, the main method is implemented. The code is as follows:

This time is used to prevent anti-crawling, setting the time delay.

First of all, let's analyze the characteristics of the next page of this website.

By clicking on the three pages, we will find that the address is based on the original "23-3jue 4p5" such changes.

We can use {} instead of changing values like this:

Https://www.ygdy8.net/html/gndy/dyzz/list_23_{}.html

So we initialize the url address and construct the request header in the inti method.

Use the for loop to traverse the URL in the main method main function.

Get the result shown in the following figure:

It means you're half done. Come on!

Now we need to make requests for these URLs, and in order to see it more intuitively, we use a class to write.

We use requests to request that the website's code is gbk (what do you think of the site's code?).

Open a website to right-check the tag in header. Take this site as an example, you can see charset= "gb312".

This gb2312 is coding. There are two common ways of coding (utf_8, gbk).

We can verify that the request has been made. Using Print (html) to see this result (a complete html page) indicates that the request was successful.

Let's define this method again (parsing our web page code).

We use regular expressions to parse the data and we right-check to see the href of the tags in the table of the site we want.

So we can first find the table, layer by layer to find, you can refer to the following figure.

The regular expression is (. *?) It's what you want, ". *?" You can omit the label and take it to the level of the area you want. For loop to get each URL, click on these URLs we want to make a request for the secondary page and parse it.

Because some of the links on the URL of the web page are empty, all of this will lead to a mismatch between the links for the movie download. So we have to add a judgment, if the length of the download link is greater than 0, then it will be displayed as usual, otherwise it will be given a null value, so that it will not be inconsistent. Finally, the result is returned, as shown in the following figure.

Click on the second-level page as shown in the figure and right-click the download link, as shown below:

We use regular expression parsing to get our download link address, as shown in the following figure:

It doesn't look very beautiful. Let's deal with the link, as shown in the following figure:

Get the result, as shown in the following figure:

Finally, we save the data in a dictionary plus a download link and the name of the movie:

Finally, let's optimize the requested code. It's a little repetitive. Let's optimize it.

After using a value to save the contents of the request header, we can only call this method to make the request, as shown in the following figure:

After the program runs, you can see the effect picture, as shown in the following figure:

At this point, the study on "how to use Python web crawler to get the movie paradise video download link" is over. I hope to be able to solve your doubts. The collocation of theory and practice can better help you learn, go and try it! If you want to continue to learn more related knowledge, please continue to follow the website, the editor will continue to work hard to bring you more practical articles!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report