How to write scripts for all kinds of URL collectors in Python 07/06 Update SLTechnology News&Howtos

How to write scripts for all kinds of URL collectors in Python

2025-07-06 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Network Security >

Shulou(Shulou.com)06/01 Report--

In this issue, the editor will bring you about the scripting of all kinds of URL collectors in Python. The article is rich in content and analyzed and described from a professional point of view. I hope you can get something after reading this article.

0x02 ZoomEyeAPI scripting

ZoomEye is a search engine for cyberspace, which includes information about devices, websites and their services or components in Internet space.

ZoomEye has two major detection engines: Xmap and Wmap, which identify the services and components used by Internet devices and websites through 24-hour uninterrupted detection and identification. Through ZoomEye, researchers can easily understand the popularity of components and the scope of vulnerabilities and other information.

Although it is called a "hacker-friendly" search engine, ZoomEye does not actively launch attacks on network devices and websites, and the data collected are only used for security research. ZoomEye is more like a nautical chart in Internet space.

#-*-coding: UTF-8-*-import requestsimport json user = raw_input ('[-] PLEASE INPUT YOUR USERNAME:') passwd = raw_input ('[-] PLEASE INPUT YOUR PASSWORD:') def Login (): data_info = {'username': user,'password': passwd} data_encoded = json.dumps (data_info) respond = requests.post (url =' https://api.zoomeye.org/user/login', Data = data_encoded) try: r_decoded = json.loads (respond.text) access_token = r_decoded ['access_token'] except KeyError: return' [-] INFO: USERNAME OR PASSWORD IS WRONG, PLEASE TRY AGAIN' return access_tokenif _ name__ = ='_ main__': print Login ()

Then, the API manual is written like this, according to this, we first write a single page of HOST to collect.

The amount of information returned is huge, but it is also a JSON data, SO, we can take out the IP part.

For x in response ['matches']: print x [' ip']

After that, HOST's single page collection is also OK, and WEB's is also fifty-fifty, leaving you to analyze it yourself. in fact, it's about the same, which will be posted later.

Next, use the FOR loop. Get multi-page IP

#-*-coding: UTF-8-*-import requestsimport json def Login (): data_info = {'username': user,'password': passwd} data_encoded = json.dumps (data_info) respond = requests.post (url =' https://api.zoomeye.org/user/login', Data = data_encoded) try: r_decoded = json.loads (respond.text) access_token = r_decoded ['access_token'] except KeyError: return' [-] INFO: USERNAME OR PASSWORD IS WRONG, PLEASE TRY AGAIN' return access_tokendef search (): headers = {'Authorization':' JWT'+ Login ()} for i in range (1 Int (PAGECOUNT): r = requests.get (url = 'https://api.zoomeye.org/host/search?query=tomcat&page='+str(i), Headers = headers) response = json.loads (r.text) for x in response ['matches']: print x [' ip'] if _ _ name__ ='_ _ main__': user = raw_input ('[-] PLEASE INPUT YOUR USERNAME:') passwd = raw_input ('[-] PLEASE INPUT YOUR PASSWORD:') PAGECOUNT = raw_input ('[ -] PLEASE INPUT YOUR SEARCH_PAGE_COUNT (eg:10):') search ()

In this way, take out the data of the page number you want, and then perfect + beautiful code.

#-*-coding: UTF-8-*-import requestsimport json def Login (user,passwd): data_info = {'username': user,'password': passwd} data_encoded = json.dumps (data_info) respond = requests.post (url =' https://api.zoomeye.org/user/login', Data = data_encoded) try: r_decoded = json.loads (respond.text) access_token = r_decoded ['access_token'] except KeyError: return' [-] INFO: USERNAME OR PASSWORD IS WRONG, PLEASE TRY AGAIN' return access_tokendef search (queryType,queryStr,PAGECOUNT,user,passwd): headers = {'Authorization':' JWT'+ Login (user,passwd)} for i in range (1 Int (PAGECOUNT): r = requests.get (url = 'https://api.zoomeye.org/'+ queryType +' / search?query='+queryStr+'&page=' + str (I) Headers = headers) response = json.loads (r.text) try: if queryType = = "host": for x in response ['matches']: print x [' ip'] if queryType = = "web": for x in response ['matches']: Print x ['ip'] [0] except KeyError: print "[ERROR] No hosts found" def main (): print "_ _" print "| _ / _ | | _ _ _ "print" / _ _\ / _\ |'_ `_ _ | _ | | / _ _ _\ / _ _ _` |'_\ "print / / | (_) | (_) | _ _ | | _ | | _ _ / _) | (_ | (_ | | "print" / _ /\ _ / _ | _ | _ | |\ _ | _ /\ _ _ | _ | | "print" | _ / "user = raw_input ('[-] PLEASE INPUT YOUR USERNAME:') passwd = raw_input ('[-] PLEASE INPUT YOUR PASSWORD:') PAGECOUNT = raw_input ('[-] PLEASE INPUT YOUR SEARCH_PAGE_COUNT (eg:10):') queryType = raw_ Input ('[-] PLEASE INPUT YOUR SEARCH_TYPE (eg:web/host):') queryStr = raw_input ('[-] PLEASE INPUT YOUR KEYWORD (eg:tomcat):') Login (user Passwd) search (queryType,queryStr,PAGECOUNT,user,passwd) if _ _ name__ = ='_ main__': main ()

0x03 ShoDanAPI scripting

Shodan is the scariest search engine on the Internet.

CNNMoney wrote in an article that while Google is currently considered to be the strongest search engine, Shodan is the scariest search engine on the Internet.

Unlike Google, Shodan does not search for web addresses on the Internet, but directly into the back of the Internet. Shodan can be said to be a "dark" Google, constantly looking for all the servers, cameras, printers, routers and so on associated with the Internet. Every month, Shodan collects information day and night on about half a billion servers.

The information gathered by Shodan is astonishing. Traffic lights, security cameras, home automation equipment and heating systems linked to the Internet can be easily found. Shodan users have found a water park control system, a gas station, and even a hotel wine cooler. Researchers on the site have also used Shodan to locate the plant's command and control system and a particle cyclotron.

Shodan's really noteworthy ability is to find almost everything associated with the Internet. The real scary thing about Shodan is that almost all of these devices have no security precautions, and they can be accessed at will.

Shallow an dalao has written, the introduction is also very detailed.

Address Portal: python version of calling based on ShodanApi Interface

Let's start with the query based on API. Official document: http://shodan.readthedocs.io/en/latest/tutorial.html

One point is deducted for each query, but not for shodan library module.

Write a simple one, which is fifty-fifty with Zoomeye, so he won't write it in detail.

#-*-coding: UTF-8-*-import requestsimport json def getip (): API_KEY = * url = 'https://api.shodan.io/shodan/host/search?key='+API_KEY+'&query=apache' headers = {' User-Agent': 'Mozilla/5.0 (Windows NT 10.0) WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87'} req = requests.get (url=url,headers=headers) content = json.loads (req.text) for i in content ['matches']: print I [' ip_str'] if _ _ name__ = ='_ main__': getip ()

Next, it is based on the shodan module. Directly quote Asahi dalao. I don't bother to write.

Installation: pip install shodan

#-*-coding: UTF-8-*-import shodanimport sysAPI_KEY = 'YOU_API_KEY' # call shodan apiFACETS = [(' country',100), # match the number of countries with the first 100 digits, 100 customizable] FACET_TITLES = {'country':' Top 100 Countries' } # input judgment if len (sys.argv) = = 1: print 'Search Method:Input the% s and then the keyword'% sys.argv [0] sys.exit () try: api = shodan.Shodan (API_KEY) query =' .join (sys.argv [1:]) print "You Search is:" + query result = api.count (query) Facets=FACETS) # using count is faster than search for facet in result ['facets']: print FACET_ Tills [facet] for key in result [' facets'] [facet]: countrie ='% s:% s'% (key ['value']) Key ['count']) print countrie with open (u "search" + "" + query + "" + u "keyword" +' .txt' 'averse') as f: f.write (countrie + "\ n") f.close () print "print" save is coutures.txt "print" Search is Complete. "except Exception, e: print 'Error:% s% e

0x04 simple BaiduURL acquisition script Writing

First, climb to a single page of URL, take a chestnut, climb to the keyword "URL" of Brother Ah Fu.

#-*-coding: UTF-8-*-import requestsfrom bs4 import BeautifulSoup as bsimport redef getfromBaidu (word): headers = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87'} url= 'https://www.baidu.com.cn/s?wd=' + word +' & pn=1' html = requests.get (url=url,headers=headers,timeout=5) soup = bs (html.content, 'lxml', from_encoding='utf-8') bqs = soup.find_all (name='a', attrs= {' data-click':re.compile (ritual.') 'class':None}) for i in bqs: r = requests.get (I [' href'], headers=headers, timeout=5) print r.urlif _ _ name__ ='_ _ main__': getfromBaidu'('Brother Ah Fu')

Then there is multi-page crawling, such as crawling the first 20 pages

#-*-coding: UTF-8-*-import requestsfrom bs4 import BeautifulSoup as bsimport redef getfromBaidu (word,pageout): headers = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87'} for k in range (0, (pageout-1) * 10 str 10): url= 'https://www.baidu.com.cn/s?wd=' + word +' & pn=' + str (k) html = requests.get (url=url,headers=headers,timeout=5) soup = bs (html.content, 'lxml') From_encoding='utf-8') bqs = soup.find_all (name='a', attrs= {'data-click':re.compile (rud.'),' class':None}) for i in bqs: r = requests.get (I ['href'], headers=headers, timeout=5) print r.urlif _ _ name__ = =' _ main__': getfromBaidu'('Brother Ah Fu', 10)

Automatic check-in script for 0x05 [colored eggs] forum

Actually posted it before, just afraid that some people didn't see it. Share it again.

Check in to get a lot of magic coins. For his multiple access methods, please poke:

Https://bbs.ichunqiu.com/thread-36007-1-1.html

To implement the method, you only need to change the COOKIE to yours.

The function is to check in automatically at 24:00 every day. Just hang it on the server.

#-*-coding: UTF-8-*-import requestsimport datetimeimport timeimport redef sign (): url = 'https://bbs.ichunqiu.com/plugin.php?id=dsu_paulsign:sign' cookie = {' _ jsluid':'3e29e6c*8966d9e0a481220',' UM_distinctid':'1605f635c78159*016-5d4e211f-1fa400-1605f635c7ac0jurisdiction 'pgv_pvi':'4680553472' *.} headers = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0 WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87'} r = requests.get (url=url,cookies=cookie,headers=headers) rows = re.findall (ruminants, r.content) if len (rows)! = 0: formhash = rows [0] print'[-] Formhash is:'+ formhash else: print'[-] None formhashworthiness' If 'you have already checked in today or the check-in time hasn't started yet.' in r.text: print'[-] Already signed signed cards Else: sign_url = 'https://bbs.ichunqiu.com/plugin.php?id=dsu_paulsign:sign&operation=qiandao&infloat=1&inajax=1' sign_payload = {' formhash':formhash, 'qdxq':'fd',' qdmode':'2', 'todaysay':'',' fastreply':0,} sign_req = requests.post (url=sign_url,data=sign_payload,headers=headers Cookies=cookie) if 'check in successfully' in sign_req.text: print'[-] Sign successful check in Else: print'[-] Something error...' Time.sleep (60) def main MSD0): while True: while True: now = datetime.datetime.now () if now.hour==h and now.minute==m: break time.sleep (20) sign () if _ name__ = ='_ main__': main () this is what it is like to write scripts for all kinds of URL collectors in Python that the editor has shared with you. If you happen to have similar doubts, you might as well refer to the above analysis to understand. If you want to know more about it, you are welcome to follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.