Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Python crawl the music material of the website

2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how Python crawls the music material of the website". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "Python how to crawl website music material"!

Preface basic development environment

Python 3.6

Pycharm

The use of related modules import osimport concurrent.futuresimport requestsimport parsel

Install Python and add it to the environment variable, and pip installs the relevant modules you need.

First, determine the demand

If you want to verify that this link is the real download address for audio, you can copy the link and paste it into a new window.

Https://downsc.chinaz.net/Files/DownLoad/sound1/202102/s830.mp3

The old idea is that some of the parameters in the copy link are searched in the developer's tools, and it is clear that S830 is the ID of audio.

Search S830 to find the source and find the download address in the web page. After obtaining the audio download address, you need to splice the url yourself.

Web data is not complex, relatively speaking, it is relatively simple.

1. Request the current web page data to obtain the audio address and audio title.

2. Just save the download

Third, code implementation

Get audio ID and audio title

Def main (html_url): html_data = get_response (html_url). Text selector = parsel.Selector (html_data) lis = selector.css ('# AudioList. Container. Audio-item') for li in lis: name = li.css ('.name::text'). Get (). Strip () src = li.css (' audio::attr (src)'). Get () audio_url = 'https:' + src save (name) Audio_url) print (name, audio_url)

Save data

Def save (name, audio_url): header = {'Upgrade-Insecure-Requests':' 1century, 'User-Agent':' Mozilla/5.0 (Windows NT 10.0) WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'} audio_content = requests.get (url=audio_url, headers=header). Content path = 'audio\\' if not os.path.exists (path): os.mkdir (path) with open (path + name + '.mp3', mode='wb') as f: f.write (audio_content)

Here you want to give a new headers parameter, otherwise you won't be able to download it. The code runs all the time, but there is no response.

Multi-thread crawling

If _ _ name__ ='_ main__': executor = concurrent.futures.ThreadPoolExecutor (max_workers=5) for page in range (1,31): url = f 'https://sc.chinaz.com/yinxiao/index_{page}.html' # main (url) executor.submit (main, url)

At this point, I believe you have a deeper understanding of "Python how to crawl website music material". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report