Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to crawl NetEase Yun Music with python language

2025-04-03 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

Editor to share with you python language how to climb NetEase cloud music, I believe that most people do not know much, so share this article for your reference, I hope you can learn a lot after reading this article, let's go to know it!

Officially enter the subject.

First of all, go to the target page and start analyzing the structure of the page, as follows

The above three arrows are all the data you are looking for, namely, comment users, comments and likes, which can all be found with regular expressions, and then continue to find out how to find the data on the next page, still using developer tools, but when you click on the next page, the url of the page has not changed, indicating that the page is dynamically loaded, so you cannot find data in the current page. You should look for it in his xhr file. So click network to have a look, and then click on the next page to have a look, there is really something you want

When I saw this, I was excited to knock the code.

One click to run, the result is nothing, but his status code is 200. it is obvious that the request was successful, but nothing was returned. Go to network to take a closer look at this web page, and see that it is a post request, and you also see that two parameters post, params and ensSecKey, are required.

As soon as I see this, full of numbers and letters, I guess it should be encrypted, but it can be copied down to see if it works. Next, let's take a look at his Response. Hey, this is a json, not a html structure, so we need to use the Json library for parsing.

Now start typing the code and copy the above two parameters to have a look.

Now get the comment users and the number of likes and comments for each comment

As you can see, you can take out the desired data after converting the data into a dictionary in the python format using the json.loads () method, but how to get the next page? You can't copy and paste those two parameters every time, right? The only way is not to climb. No way? If I go on, then I'm going to crack these two parameters. Well, continue to look at network, because if you want to encrypt it, you must use js to encrypt it.

See core.js, the initiator of the website just now, and then download its file and study it slowly.

Save after beautification, and then look for that encSecKey parameter (ps:JSj'e'tong'yang'de beautification URL is www.css88.com/tool/js_bea.

Seeing that the window.asrsea () method has four parameters, let's ignore the function and take a look at what his four parameters are. There is no need to study how those four parameters come from, we just need to know what they are, so we can add some code to let him show them and use fiddler to debug.

The code is as follows

You can get each of the above parameters separately, and take a look at that params, and then do the following on fiddler

Refresh the page after completing the above settings and you will find the parameter information on the console. If not, this is because it was cached when you browsed the page before, so clear the cache file (in the clear browser record).

The id in which rid has this song is obviously related to comments. After I tried to turn a few pages, I found that that offset is the number of comments offset. Offset is (page-1) * 20 total is true on the first page and false on the other pages.

The second parameter obtained by the same method is 010001.

The third parameter is: 00e0b509f6259df8642dbc35662901477df22677ec152b5ff68ace615bb7b725152b3ab17a876aea8a5aa76d2e417629ec4ee341f56135fccf695280104e0312ecbda92557c93870114af6c9d05c4f7f0c3685b7a46bee255932575cce10b424d813cfe4875d3e82047b97ddef52741d546b8e289dc6935b3ece0462db0a22b8e7

The fourth parameter is: 0CoJUm6Qyw8W8jud

Next it depends on how the window.asrsea () method operates, or whether you can see this by looking for the js file.

Through the study, I randomly obtains sixteen characters and the b function is AES encryption, in which the offset is 0102030405060708, the mode is CBC, look back to the d function, in which params is encrypted twice in a row, and the text is the first parameter in the first encryption. The key is the fourth parameter, the text of the second encryption is the value of the first encryption, and the key is a random number. While encSeckey is an RSA encryption, its public key is the second parameter, the mode is the third parameter, and the text is the random string a.

Finally finished the analysis, and then began to knock the code.

Let's start with a code that gets comments on the first page.

This is the class that gets two parameters

This is the class to analyze NetEase Yun's music and get comments.

However, as soon as you click to run, you directly report an error to me: TypeError: can't concat str to bytes

It turns out that because in the second encryption, the params is a byte type, so you can convert it to a string type.

Click run again, and you still get an error: json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

This report error because my json parsing is wrong, look back to debug, the page returned something is empty, but his status code is 200. what the heck is this? Then I tried to copy the values of those two parameters directly and look at them as before, and the results were successful, which showed that my encryption process was wrong. Then I went back to read a few articles, and I couldn't see anything wrong. Baidu found this Zhihu article on the Internet. I copied her code to run it, and it turned out to be OK. I continued to look at the difference between me and her. It turned out that I used the wrong 16 random characters. I gave two different parameters, but needed to give a common one. Seeing here, I went back and changed it directly. Sure enough, I ran successfully, and I didn't post the code. The effect is as follows.

The next step is to get comments on each page, and each page is related to the offset of the first parameter, where the formula is offse= (number of pages-1) 20 pages total is true on the first page and false on the other pages

Then click to run, but this exception occurs when you run to page 8.

Raise errorclass (errno, errval)

Pymysql.err.InternalError: (1366, "Incorrect string value:'\ xF0\ x9F\ x92\ x94' for column 'content' at row 1")

This is the database effect of the home page

Get finished (are there so few reviews of Jiaju's songs? Emmmm...)

The above is all the contents of the article "how to crawl NetEase Cloud Music in python language". Thank you for reading! I believe we all have a certain understanding, hope to share the content to help you, if you want to learn more knowledge, welcome to follow the industry information channel!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report