Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the decryption of the window.__DATA__ of the search page

2025-01-17 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article will explain in detail how the window.__DATA__ decryption of the search page is, and the content of the article is of high quality, so the editor will share it with you for reference. I hope you will have a certain understanding of the relevant knowledge after reading this article.

After the last release of the latest version of Zhihu simulated landing, many people read it and said yes, but no one gave me a like. I would still trouble you in the future. If you see an article that is useful to you, you can like it and support the author.

What I want to share with you this time is Douban's book search page.

Https://book.douban.com/subject_search?search_text=%E7%BD%91%E7%BB%9C%E6%98%AF%E6%80%8E%E6%A0%B7%E8%BF%9E%E6%8E%A5%E7%9A%84&cat=1001

In the past, I thought Douban was only for novice reptiles to practice, but it was not until I found this page that I found that I was wrong. Douban is also encrypted, which may be to prevent novice crawlers from crawling. As soon as I see this, I think it is more and more difficult for crawlers to do. Any page has js encryption.

And I think it is more difficult than the last Zhihu, but it is possible that only the search page is encrypted, and other direct search for all books are directly placed in the html source code.

Don't talk too much nonsense, look down on life and death, do it if you don't accept it! Start the analysis.

1. Find search content

The book data is directly encrypted and hidden in the html page. If you need to find it, you may have to look for it for a long time. I told me from a big boss, which is in the window.__DATA__ of the following page.

However, if you need to find your own, you can. You need some time to look for it carefully. If you have experience with this, you can try to find it in the html of this page if you encounter those who cannot find the data later.

two。 Find the decryption location

Just search window.__DATA__.

The above is the html content, so it's the next one, or you can try it yourself if you don't believe it.

Break point, a look, sure enough, the data came out

Then you slowly debug yourself. Click next to debug yourself slowly to see the methods used.

It is easy to find this after a few steps, and this can be said to be the decryption step of the above method.

The next step is to read these JS contents for yourself. it is not difficult. If you are good, you can see the encryption methods used by him. You can directly use the relevant encryption methods of Python language. If you do not understand it, you need to deduct the relevant JS by yourself.

If you are new to JS and have learned the properties of the JS prototype, it is highly recommended that you solve it, because the JS here is distributed in two files, which is not as simple as solving a function directly in Zhihu last time, and it is deducted that you also need to change the prototype between functions or objects. In short, it is very complicated, this can only be understood, and the space is limited. It's impossible to deduct all of them and tell you how to change them.

I still have to practice it myself to feel it. I finally spent a few days withholding 1500 + lines of code.

3. Run with Python

If you deduct the execjs execution of JS using python, there will be a problem.

UnicodeDecodeError: 'gbk' codec can't decode byte 0x80 in position 41: illegal multibyte sequence

This is because it uses a TextIOWrapper object, which does not specify an encoding type, and uses the default cp936, that is, gbk encoding, which causes errors in reading characters. We can initialize and change the encoding type to utf-8 on this class to run.

Run it again and you will succeed.

On the search page of the window.__DATA__ decryption is how to share here, I hope the above content can be of some help to you, can learn more knowledge. If you think the article is good, you can share it for more people to see.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report