In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-03-31 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly shows you "python crawler how to crawl Douyin popular music", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "how python crawler crawls Douyin hot music" this article.
Climb the popular music of Douyin
This is relatively simple, it's the result of the code running.
Get the URL of music at https://kuaiyinshi.com/hot/music/?source=dou-yin&page=1
Open the web page F12JF5 refresh
All you need to volunteer is the above data.
Get it according to beautifulsoup and go to the code directly.
Headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'
}
# Save path
Save_path = "G:\\ Music\\ douyin\"
Url = "https://kuaiyinshi.com/hot/music/?source=dou-yin&page=1"
# get response
Res = requests.get (url, headers=headers)
# parsing using beautifulsoup
Soup = BeautifulSoup (res.text, 'lxml')
# Select the tag to get the maximum number of pages
Max_page = soup.select ('li.page-item > a') [- 2] .text
# Loop request
For page in range (int (max_page)):
Page_url = "https://kuaiyinshi.com/hot/music/?source=dou-yin&page={}".format(page + 1)
Page_res = requests.get (page_url, headers=headers)
Soup = BeautifulSoup (page_res.text, 'lxml')
Lis = soup.select ('li.rankbox-item')
Singers = soup.select ('div.meta')
Music_names = soup.select ('h3.tit > a')
For i in range (len (lis)):
Music_url = "http:" + lis.get ('data-audio')
Print ("title:" + music_ names.text, singers [I] .text, "link:" + music_url)
Try:
Download_file (music_url
Save_path + music_ names.text +'-'+ singers [I] .text.replace ('/','') + ".mp3")
Except:
Pass
Print ("Page {} complete ~" .format (page + 1))
Time.sleep (1)
Pass the url of the obtained file to the download function
Def download_file (src, file_path):
# response body workflow
R = requests.get (src, stream=True)
# Open the file
F = open (file_path, "wb")
# for chunk in r.iter_content (chunk_size=512):
# if chunk:
# f.write (chunk)
For data in tqdm (r.iter_content (chunk_size=512)):
# use of tqdm progress bar, for data in tqdm (iterable)
F.write (data)
Return file_path
Then there is an explanation of the workflow of the response body.
By default, when you make a web request, the response will be downloaded immediately. You can override this behavior with the stream parameter and defer downloading the response body until you access the Response.content property:
Tarball_url = 'https://github.com/kennethreitz/requests/tarball/master' r = requests.get (tarball_url, stream=True)
At this point, only the response header is downloaded, and the connection remains open, so we are allowed to get the content according to the conditions:
If int (r.headers ['content-length']) < TOO_LONG: content = r.content...
You can further use the Response.iter_content and Response.iter_lines methods to control the workflow, or use Response.raw to urllib3 the urllib3.HTTPResponse from the underlying
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.