How python crawler crawls popular music on Douyin 07/09 Update SLTechnology News&Howtos

How python crawler crawls popular music on Douyin

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Shulou(Shulou.com)06/01 Report--

This article mainly shows you "python crawler how to crawl Douyin popular music", the content is easy to understand, clear, hope to help you solve your doubts, the following let the editor lead you to study and learn "how python crawler crawls Douyin hot music" this article.

Climb the popular music of Douyin

This is relatively simple, it's the result of the code running.

Get the URL of music at https://kuaiyinshi.com/hot/music/?source=dou-yin&page=1

Open the web page F12JF5 refresh

All you need to volunteer is the above data.

Get it according to beautifulsoup and go to the code directly.

Headers = {

'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.99 Safari/537.36'

}

# Save path

Save_path = "G:\\ Music\\ douyin\"

Url = "https://kuaiyinshi.com/hot/music/?source=dou-yin&page=1"

# get response

Res = requests.get (url, headers=headers)

# parsing using beautifulsoup

Soup = BeautifulSoup (res.text, 'lxml')

# Select the tag to get the maximum number of pages

Max_page = soup.select ('li.page-item > a') [- 2] .text

# Loop request

For page in range (int (max_page)):

Page_url = "https://kuaiyinshi.com/hot/music/?source=dou-yin&page={}".format(page + 1)

Page_res = requests.get (page_url, headers=headers)

Soup = BeautifulSoup (page_res.text, 'lxml')

Lis = soup.select ('li.rankbox-item')

Singers = soup.select ('div.meta')

Music_names = soup.select ('h3.tit > a')

For i in range (len (lis)):

Music_url = "http:" + lis.get ('data-audio')

Print ("title:" + music_ names.text, singers [I] .text, "link:" + music_url)

Try:

Download_file (music_url

Save_path + music_ names.text +'-'+ singers [I] .text.replace ('/','') + ".mp3")

Except:

Pass

Print ("Page {} complete ~" .format (page + 1))

Time.sleep (1)

Pass the url of the obtained file to the download function

Def download_file (src, file_path):

# response body workflow

R = requests.get (src, stream=True)

# Open the file

F = open (file_path, "wb")

# for chunk in r.iter_content (chunk_size=512):

# if chunk:

# f.write (chunk)

For data in tqdm (r.iter_content (chunk_size=512)):

# use of tqdm progress bar, for data in tqdm (iterable)

F.write (data)

Return file_path

Then there is an explanation of the workflow of the response body.

By default, when you make a web request, the response will be downloaded immediately. You can override this behavior with the stream parameter and defer downloading the response body until you access the Response.content property:

Tarball_url = 'https://github.com/kennethreitz/requests/tarball/master' r = requests.get (tarball_url, stream=True)

At this point, only the response header is downloaded, and the connection remains open, so we are allowed to get the content according to the conditions:

If int (r.headers ['content-length']) < TOO_LONG: content = r.content...

You can further use the Response.iter_content and Response.iter_lines methods to control the workflow, or use Response.raw to urllib3 the urllib3.HTTPResponse from the underlying

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.