Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does python crawl videos?

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article introduces the relevant knowledge of "how python crawls video". In the operation process of actual cases, many people will encounter such difficulties. Next, let Xiaobian lead you to learn how to deal with these situations! I hope you can read carefully and learn something!

Crawling for topic resolution

ideas

1. Crawl out all the resolution links for the topic and store them separately in a file.

2. In order to solve some links in one time must interrupt the program to start again, and store the same picture idea, strive to achieve breakpoints continue,

3. However, writing a file and saving a picture are still different. In view of the situation faced, the initial solution idea is to delete the link every time a link is extracted, and store the link read from the link file with a list.

Crawl pictures and animations simultaneously

Because the title of subject 4 contains animation, the website to be crawled is made into mov format Short Video

For example:

Your browser does not support frames.

Add Get Video Link

The method of obtaining pictures in series one is to transfer the options and answers to the BeautifulSoup object again, and then extract the img tag again. If there is no picture in a certain question, the extracted value is a null value. Here, extract the img and video tags. If there is no picture or video for a question, a null value is extracted. Just change the code.

img = soup.find_all ('img ',' video']) Get image or GIF suffix

Series one in order to facilitate the file name directly after the string form.png suffix, but now to solve the suffix inconsistency (write code as much as possible or not lazy…)

Resolution Code:

if img: for im in img: src = im.get('src') suffix = src.split('. ')[3] filename = str(i) + '. ' + suffix

If there is a picture or animation in this question, put the link to this picture through '. 'Divide. The last element is the suffix.

Strive to achieve breakpoint Continue

Do not know why the program will pause, imitate the browser, catch exceptions have tried, still not, so I try to achieve breakpoints continue.

Each image corresponds to a link, and inevitably there is a link stuck (I guess)

Solution:

After we get the link and generate the file name, we don't open the link first. We first judge whether the image is already contained in the folder according to the file name. If it is contained, throw away the link and continue to the next link.

if img: for im in img: src = im.get('src') suffix = src.split('. ')[3] filename = str(i) + '. ' + suffix if os.path.exists('picture/'+filename): break saveImg(im.get('src'),filename)

"Python how to crawl video" content is introduced here, thank you for reading. If you want to know more about industry-related knowledge, you can pay attention to the website. Xiaobian will output more high-quality practical articles for everyone!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report