Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

What is the role of Python implicit Style-CSS in anti-crawlers

2025-01-20 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "what is the role of Python implicit Style-CSS in anti-crawler". The content of the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the role of Python implicit Style-CSS in anti-crawler"?

Application of implicit Style-CSS in Anti-crawler what is implicit Style-CSS

Let's start by chatting about what an implicit Style-CSS is:

In CSS,:: before creates a pseudo element that will be the first child element that matches the selected element. Content attributes are often used to add decorating content to an element

Derived from: https://developer.mozilla.org

The quotation above seems a little difficult to understand, but it doesn't matter. Let's use the following example to demonstrate it briefly.

Let's create a new HTML file and enter something like this:

Hello everyone, I am salted fish, I am a member of NightTeam from: https://developer.mozilla.org

And reference the following style file in the HTML:

Q::before {content: "«"; color: blue;} q::after {content: "»"; color: red;} from: https://developer.mozilla.org

Finally, the content displayed in the browser is as follows:

, derived from: https://developer.mozilla.org

You can see that in the above example, I hide the symbols before and after the text in the source code, but it is normal when displayed in the browser.

At present, many novel websites use anti-crawling techniques like this to protect their content from crawlers.

So what should be done with anti-climbing techniques like this?

Example explanation

Salted fish prepared an example of simple actual combat, with an example to tell me how to deal with this kind of anti-crawling.

This example runs locally, so there is no step to analyze the request, directly analyze the browser display and source code to see if there is any breakthrough.

The browser shows:

The source code shows:

You can see part of the content replaced in the source code.

Page analysis

Open developer mode and see what the hidden text looks like. [figure 2-1]

You can see that the content of content in box 2 in [figure 2-1] is exactly what box 1 hides in the source page.

This is consistent with our example of implicit style-css in the first part.

So in order to get all the content, we just need to replace the replaced span tag back to the value of content in box 2 in [figure 2-1].

According to the normal page structure, you can jump to the location of the CSS file directly by clicking the place selected in the box in [figure 2-2].

But there is no such clickable position in our page structure, so we can only find a breakthrough by analyzing the rules of span tags.

The class names of all span tags are concatenated with context_kw and a number. Let's try to search context_kw.

You can see that the JS code related to context_kw is found in the file. [figure 2-3]

Skimming through the entire JS code, this JS is divided into two parts by function. [figure 2-4]

The first part: the logical content of the encryption and decryption of CryptoJS can be ignored.

The second part: after the confusion of variable names, the JS of the second part decrypts the ciphertext in the array, operates DOM, and completes the combination of JS and CSS to complete the main logic of anti-crawling.

Partial encryption analysis

According to the code in the second part that manipulates DOM, we find the key variable word.

For (var I = 0x0; I < words [_ 0xea12 ('0x18')]; iTunes +) {try {document [_ 0xea12 (' 0x2a')] [0x0] [_ 0xea12 ('0x2b')] (' .context _ kw' + I + _ 0xea12 ('0x2c'),' content:\ x20\ x22' + words [I] +'\ x22') } catch (_ 0x527f83) {document ['styleSheets'] [0x0] [' insertRule'] (_ 0xea12 ('0x2d') + I + _ 0xea12 (' 0x2e') + words [I] +'\ x22}', document [_ 0xea12 ('0x2a')] [0x0] [_ 0xea12 (' 0x2f')] [_ 0xea12 ('0x18')]);}}

Then continue to find the place where the variable declaration of word is.

Var secWords = decrypted [_ 0xea12 ('0x16')] (CryptoJS [' enc'] ['Utf8']) [_ 0xea12 (' 0x17')] (','); var words = new Array (secWords [_ 0xea12 ('0x18')])

According to this method, we finally find that the content of CSS's content is the value of an encrypted element in the array _ 0xa12e decrypted with AES and then processed.

With such a logical framework in place, we can directly start withholding the JS code we need.

Encryption code modification

This code is relatively simple, and the specific withholding steps will not be demonstrated. Here we demonstrate two points that need to be rewritten after withholding the code.

The first is the exception catch in [figure 2-5], where we determine whether the current URL is equal, but our execution in the Node environment has no window attribute, and an exception will occur if no modification is made, so we need to comment on the if judgment statement here.

The second is the judgment statement in the return in [figure 2-6], which also judges the attributes that do not exist in Node, so you also need to modify them here.

For example:

_ 0x1532b6 [_ 0xea12 ('0x26')] (_ 0x490c80, 0x3 * +! (' object' = _ 0xea12 ('0x27')

After modifying the above two points, you can get all the characters that have been replaced.

Thank you for your reading, the above is the content of "what is the role of Python implicit Style-CSS in anti-reptiles". After the study of this article, I believe you have a deeper understanding of the role of Python implicit Style-CSS in anti-reptiles, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report