In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-27 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains "how Python crawls the music material of the website". Interested friends may wish to have a look. The method introduced in this paper is simple, fast and practical. Let's let the editor take you to learn "Python how to crawl website music material"!
Preface basic development environment
Python 3.6
Pycharm
The use of related modules import osimport concurrent.futuresimport requestsimport parsel
Install Python and add it to the environment variable, and pip installs the relevant modules you need.
First, determine the demand
If you want to verify that this link is the real download address for audio, you can copy the link and paste it into a new window.
Https://downsc.chinaz.net/Files/DownLoad/sound1/202102/s830.mp3
The old idea is that some of the parameters in the copy link are searched in the developer's tools, and it is clear that S830 is the ID of audio.
Search S830 to find the source and find the download address in the web page. After obtaining the audio download address, you need to splice the url yourself.
Web data is not complex, relatively speaking, it is relatively simple.
1. Request the current web page data to obtain the audio address and audio title.
2. Just save the download
Third, code implementation
Get audio ID and audio title
Def main (html_url): html_data = get_response (html_url). Text selector = parsel.Selector (html_data) lis = selector.css ('# AudioList. Container. Audio-item') for li in lis: name = li.css ('.name::text'). Get (). Strip () src = li.css (' audio::attr (src)'). Get () audio_url = 'https:' + src save (name) Audio_url) print (name, audio_url)
Save data
Def save (name, audio_url): header = {'Upgrade-Insecure-Requests':' 1century, 'User-Agent':' Mozilla/5.0 (Windows NT 10.0) WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Safari/537.36'} audio_content = requests.get (url=audio_url, headers=header). Content path = 'audio\\' if not os.path.exists (path): os.mkdir (path) with open (path + name + '.mp3', mode='wb') as f: f.write (audio_content)
Here you want to give a new headers parameter, otherwise you won't be able to download it. The code runs all the time, but there is no response.
Multi-thread crawling
If _ _ name__ ='_ main__': executor = concurrent.futures.ThreadPoolExecutor (max_workers=5) for page in range (1,31): url = f 'https://sc.chinaz.com/yinxiao/index_{page}.html' # main (url) executor.submit (main, url)
At this point, I believe you have a deeper understanding of "Python how to crawl website music material". You might as well do it in practice. Here is the website, more related content can enter the relevant channels to inquire, follow us, continue to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.