In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-16 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article introduces the relevant knowledge of "how to use Python to collect web quality data to Excel table". In the operation of actual cases, many people will encounter such a dilemma, so let the editor lead you to learn how to deal with these situations. I hope you can read it carefully and be able to achieve something!
Preparation before Python scripting:
Download the pycurl module and double-click to install it.
Xlsxwriter is installed using the pip command, and here you need to pay attention to whether the environment variables are configured.
1. Since pycurl is downloaded and installed directly, it will not be written here, which is relatively simple.
2. Install xlsxwriter module (need to be able to connect to Internet)
3. The script for collecting data is as follows:
# _. _ coding:utf-8 _. _ import os,sysimport pycurlimport xlsxwriterURL= "www.baidu.com" # detect the url of the target, which target needs to be detected Which one can be changed here is c = pycurl.Curl () # create a curl object c.setopt (pycurl.URL, URL) # define the request url constant c.setopt (pycurl.CONNECTTIMEOUT, 10) # define the wait time of the request connection c.setopt (pycurl.TIMEOUT, 10) # define the request timeout c.setopt (pycurl.NOPROGRESS 1) # Shield download progress bar c.setopt (pycurl.FORBID_REUSE, 1) # force disconnection after completing interaction Do not reuse c.setopt (pycurl.MAXREDIRS, 1) # specify that the maximum number of HTTP redirects is 1c.setopt (pycurl.DNS_CACHE_TIMEOUT, 30) # create a file object and open it in 'wb' mode Used to store the returned http header and page content indexfile = open (os.path.dirname (os.path.realpath (_ _ file__)) + "/ content.txt", "wb") c.setopt (pycurl.WRITEHEADER, indexfile) # directs the returned http header to the indexfile file c.setopt (pycurl.WRITEDATA) Indexfile) # directs the returned html content to the indexfile file c.perform () NAMELOOKUP_TIME = c.getinfo (c.NAMELOOKUP_TIME) # get DNS parsing time CONNECT_TIME = c.getinfo (c.CONNECT_TIME) # get connection time TOTAL_TIME = c.getinfo (c.TOTAL_TIME) # Total time HTTP_CODE = c.getinfo (c.HTTP_CODE) # get HTTP status SIZE_DOWNLOAD = c.getinfo (c.SIZE_DOWNLOAD) # get download packet size HEADER_SIZE = c.getinfo (c.HEADER_SIZE) # get HTTP header size SPEED_DOWNLOAD=c.getinfo (c.SPEED_DOWNLOAD) # get average download speed print u "HTTP status code:% s"% (HTTP_CODE) # output status code print u "DNS parsing time:% .2f ms"% (NAMELOOKUP _ TIME*1000) # output DNS parsing time print u "establish connection time:% .2f ms"% (CONNECT_TIME*1000) # output connection establishment time print u "Total transfer end time:% .2f ms"% (TOTAL_TIME*1000) # output total transfer end time print u "download packet size:% d bytes/s"% (SIZE_DOWNLOAD) # output download data Package size print u "HTTP header size:% d byte"% (HEADER_SIZE) # output HTTP header size print u "average download speed:% d bytes/s"% (SPEED_DOWNLOAD) # output average download speed indexfile.close () # close file c.close () # close curl object f = file ('chart.txt' 'a') # Open a chart.txt file Append f.write (str (HTTP_CODE) +','+ str (NAMELOOKUP_TIME*1000) +','+ str (CONNECT_TIME*1000) +','+ str (TOTAL_TIME*1000) +','+ str (SIZE_DOWNLOAD/1024) +','+ str (HEADER_SIZE) +' '+ str (SPEED_DOWNLOAD/1024) +'\ n') # write the output above to the chart.txt file f.close () # close the chart.txt file workbook = xlsxwriter.Workbook ('chart.xlsx') # create a chart.xlsx excel file worksheet = workbook.add_worksheet () # create a worksheet object The default is Sheet1chart = workbook.add_chart ({'type':' column'}) # create a chart object title = [URL, u 'HTTP status code', u 'DNS parsing time', u 'connection establishment time', u 'transfer end time', u 'download packet size', u 'HTTP header size' U' average download speed'] # define data header list format=workbook.add_format () # define format format object format.set_border (1) # define format object cell border bold (1 pixel) format format_title=workbook.add_format () # define format_title format object format_title.set_border (1) # define format_title object cell border bold ( 1 pixel) format format_title.set_bg_color ('# 00FF00') # defines format_title object cell background color as'# cccccc'format_title.set_align ('center') # defines format_title object cell center alignment format format_title.set_bold () # defines format_title object cell content bold format worksheet.write_row (0) 0line lineList title format title) # writes the contents of title to the first line f = open ('chart.txt','r') # Open chart.txt file line = 1 # definition variable line equals 1for i in f: # Open for circular reading file head = [line] # definition variable head equals line lineList = i.split (' ') # convert the string to list form lineList = map (lambda i2:int (float (i2.replace ("\ n",'')), lineList) # delete the last\ nin the list, delete the number after the decimal point, convert the floating point to integer lineList = head + lineList # add worksheet.write_row (line, 0, lineList Format) # write the data to the execl table line + = 1average = [u 'average','= AVERAGE (B2VV B' + str ((line-1)) +')','= AVERAGE (C2line C' + str ((line-1)) +')','= AVERAGE (D2DRV'+ str ((line-1)) +')','= AVERAGE (E2Rose E' + str ((line-1)) +'') '= AVERAGE (F2line F' + str ((line-1)) +')','= AVERAGE (G2RV G' + str ((line-1)) +')','= AVERAGE (H2line H' + str ((line-1)) +')] # calculate the average worksheet.write_row (line, 0, average) of each column Format) # write the average value f.close () # close the file def chart_series (cur_row, line): # define a function chart.add_series ({'categories':' = Sheet1'= Sheet1'= Sheet1's Bourne 'Hang1', # the parameter to be output as the chart data label (X axis) 'values':' = Sheet1 'average value of cursive rowpieces' # get the data from column B to column H 'line': {' color': 'black'}, # the line color is defined as black' name':' = Sheet1distinct colors + cur_row, # reference business name is legend item}) for row in range (2 Line + 1): # A series of functions calling chart_series (str (row), line) chart.set_size ({'width':876,'height':287}) # defining the width and height of the chart (line + 2,0) Chart) # insert chart workbook.close () # two rows below the last row of data # close the execl document
4. After running the script, three files will be generated in the same directory as the script. Two files are txt text files and one is Excel file. After executing the script, the following information will be displayed:
5. The files generated in the current directory are as follows:
Among them, the two txt format files are to pave the way for Excel, so you can optionally ignore them, mainly looking at the data in Excel. The data in Excel is as follows (the following is the display result after 6 times of execution of the script, that is, 6 times of detection):
"how to use Python to collect web quality data to Excel table" content is introduced here, thank you for reading. If you want to know more about the industry, you can follow the website, the editor will output more high-quality practical articles for you!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.