Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Python monitor and measure websites

2025-01-25 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

This article mainly explains "how Python monitors and measures websites". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "how Python monitors and measures websites".

Brief introduction of important indicators of Network

In early May, Google launched Core Web Vitals, which is part of its key Web Vitals indicator.

These metrics are used to provide guidance on the quality of user experience on the site.

Google describes it as a way to "help quantify your site experience and identify opportunities for improvement", further highlighting their shift to focusing on user experience.

The vitality of the core network is real, user-centric indicators used to measure key aspects of the user experience. Loading time, interaction and stability.

In addition, Google announced last week that it will introduce a new search ranking signal that combines these metrics with existing page experience signals, such as mobile device friendliness and HTTPS security, to ensure that they continue to provide services to users of high-quality websites.

Monitoring performance index

This update is expected to be released in 2021, and Google has confirmed that no immediate action is required.

However, to help us prepare for these changes, they updated the tools used to measure page speed, including PSI,Google Lighthouse and Google Search Console Speed Report.

Where does Pagespeed Insights API start?

Google's PageSpeed Insights is a useful tool for viewing summaries of web page effects, using field and lab data to generate results.

This is a good way to get a few URL outlines because it is used page by page.

However, if you work on a large site and want to gain large-scale insights, the API can facilitate the analysis of more than one page at a time without having to insert the URL separately.

Python script used to measure performance

I created the following Python script to measure key performance metrics on a large scale to save the time spent manually testing each URL.

The script uses Python to send the request to Google PSI API to collect and extract the metrics displayed in PSI and Lighthouse.

I decided to write this script in Google Colab because it's a good way to start writing Python and allow easy sharing, so this article will use Google Colab throughout the installation process.

However, it can also run locally, making some adjustments to the uploading and downloading of data.

It is important to note that some steps may take some time to complete, especially when each URL runs through API, in order not to overload the request.

Therefore, you can run the script in the background and return to the script after completing the steps.

Let's walk through the steps required to start and run this script.

Step 1: install the required software packages

Before we start writing any code, we need to install some Python packages before we can use the script. These are easy to install using the import feature.

The software package we need is:

Urllib: used to process, open, read and parse URL.

Json: allows you to convert JSON files to Python or Python files to JSON.

Request: a HTTP library for sending various HTTP requests.

Pandas: mainly used for data analysis and processing, we are using it to create DataFrames.

Time: a module for processing time that we are using to provide time intervals between requests.

Files: through Google Colab, you can upload and download files.

Io: the default interface for accessing files.

# Import required packages import json import requests import pandas as pd import urllib import time from google.colab import files import io step 2: set API request

The next step is to set up the API request. The full description can be found here, but in essence, the command will look like this:

Https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={yourURL}/&strategy=mobile/&key={yourAPIKey}

This will allow you to attach URL, policy (desktop or mobile), and API keys.

To use it in Python, we will use the urllib request library urllib.request.urlopen and add it to a variable named result so that we can store the results and use them again in the script.

# Define URL url = 'https://www.example.co.uk' # API request url result = urllib.request.urlopen (' https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={}/&strategy=mobile'\ .format (url)) .read () .decode ('UTF-8') print (result) step 3: test API

To test the correct setup of the API and to understand what was generated during the test, I ran a URL through API using the simple urllib.request method.

When this is done, I convert the result to a json file and download it to view the results.

# Convert to json format result_json = json.loads (result) print (result_json) with open ('result.json', 'w') as outfile: json.dump (result_json, outfile) files.download (' result.json')

(note that this method is used to convert and download JSON files in Google Colab.)

Step 4: read the JSON file

The JSON file displays field data (stored under loadingExperience) and lab data (which can be found under lighthouseResult).

To extract the desired metrics, we can use the format of the JSON file because we can see the metrics below each section.

Step 5: upload CSV and store it as Pandas data box

The next step is to upload the CSV file of URL that we want to run through PSI API. You can generate a list of site URL through crawling tools such as DeepCrawl.

When we use API, it is recommended that you use a smaller set of URL samples here, especially if you have a large site.

For example, you can use the most visited page or the page that generates the most revenue. In addition, if your site has templates, it is ideal for testing templates in them.

You can also add the column-header variable here, which we will use when traversing the list. Make sure that this name matches the column header name in the CSV file you uploaded:

Uploaded = files.upload () # if your column header is something other than 'url' please define it here column_header='url'

(note that this method is used to upload CSV files in Google Colab.)

After uploading it, we will use the Pandas library to convert CSV to DataFrame, which we can iterate through in the following steps.

# Get the filename from the upload so we can read it into a CSV. For key in uploaded.keys (): filename = key # Read the selected file into a Pandas Dataframe df = pd.read_csv (io.BytesIO (recognized [filename])) df.head ()

DataFrame looks like this, starting with zero index.

Step 6: save the results to the response object

The next step involves using a for loop to iterate over the DataFrame of the URL you just created through PSI API.

The for loop allows us to iterate through the uploaded list and execute commands for each project. We can then save the result to the response object and convert it to a JSON file.

Response_object = {} # Iterate through the df for x in range (0 Len (df): # Define request parameter url = df.iloc [x] [column_header] # Make request pagespeed_results = urllib.request.urlopen ('https://www.googleapis.com/pagespeedonline/v5/runPagespeed?url={}&strategy=mobile'.format(url)).read().decode('UTF-8') # Convert to json format pagespeed_results_json = json.loads ( Pagespeed_results) # Insert returned json response into response_object response_ object [url] = pagespeed_results_json time.sleep (30) print (response_ object [url])

We will use the scope x here, which represents the URL that is running in the loop, and (0 URL) allows the loop to traverse all the URL in the DataFrame, no matter how many URL it contains.

The response object prevents us from rewriting your URL by rewriting the loop, enabling us to save the data for future use.

This is also where the URL request parameters are defined using the column header variable before converting it to a JSON file.

I also set the sleep time here to 30 seconds to reduce the number of consecutive API calls.

In addition, if you want to make a request faster, you can append an API key to the end of the URL command.

Indentation is also important here because each step is part of the for loop, so you must indent them in the command.

Step 7: create a data box to store the response

We also need to create a DataFrame to store the metrics we want to extract from the response object.

DataFrame is a table-like data structure with columns and rows that store data. We just need to add a column for each metric and name it appropriately, as follows:

# Create dataframe to store responses df_pagespeed_results = pd.DataFrame (columns= ['url',' Overall_Category', 'Largest_Contentful_Paint',' First_Input_Delay', 'Cumulative_Layout_Shift',' First_Contentful_Paint', 'Time_to_Interactive',' Total_Blocking_Time' 'Speed_Index']) print (df_pagespeed_results)

For the purposes of this script, I used Core Web Vital metrics as well as other load and interactivity metrics used in the current Lighthouse version.

These metrics have different weights and then use them for overall performance scores:

LCP

FID

CLS

FCP

TTI

TBT

You can find more information about each metric and how to interpret the score on the target pages linked above.

I also choose to include the speed index and the overall category, which will provide slow, average or fast scores.

Step 8: extract metrics from the response object

After saving the response object, we can now filter it and extract only the metrics we need.

Here, we will again use the for loop to iterate through the response object file and set a series of list indexes to return only specific metrics.

To do this, we will define the column name from DataFrame and the specific category of the response object from which each metric is extracted for each URL.

For (url, x) in zip (response_object.keys (), range (0, len (response_object): # URLs df_pagespeed_results.loc [x, 'url'] =\ response_ object [url] [' lighthouseResult'] ['finalUrl'] # Overall Category df_pagespeed_ results.lo [x 'Overall_Category'] =\ response_ object [url] [' loadingExperience'] ['overall_category'] # Core Web Vitals # Largest Contentful Paint df_pagespeed_ results.loc[ x 'Largest_Contentful_Paint'] =\ response_ object [url] [' lighthouseResult'] ['audits'] [' largest-contentful-paint'] ['displayValue'] # First Input Delay fid = response_ object [url] [' loadingExperience'] ['metrics'] [' FIRST_INPUT_DELAY_MS'] df_pagespeed_ results.Lok [x 'First_Input_Delay'] = fid [' percentile'] # Cumulative Layout Shift df_pagespeed_results.loc [x, 'Cumulative_Layout_Shift'] =\ response_ object [url] [' lighthouseResult'] ['audits'] [' cumulative-layout-shift'] ['displayValue'] # Additional Loading Metrics # First Contentful Paint df_pagespeed_results.loc [x 'First_Contentful_Paint'] =\ response_ object [url] [' lighthouseResult'] ['audits'] [' first-contentful-paint'] ['displayValue'] # Additional Interactivity Metrics # Time to Interactive df_pagespeed_results.loc [x 'Time_to_Interactive'] =\ response_ object [url] [' lighthouseResult'] ['audits'] [' interactive'] ['displayValue'] # Total Blocking Time df_pagespeed_results.loc [x,' Total_Blocking_Time'] =\ response_ object [url] ['lighthouseResult'] [' audits'] ['displayValue'] [' displayValue'] # Speed Index df_pagespeed_ results.loco [x 'Speed_Index'] =\ response_ object [url] [' lighthouseResult'] ['audits'] [' speed-index'] ['displayValue']

I have set this script to extract the key metrics mentioned above, so you can use it to collect this data immediately.

However, you can extract many other useful metrics that can be found in both PSI tests and Lighthouse analysis.

This JSON file can be used to view the location of each metric in the list.

For example, when extracting metrics (such as the display value of interaction time) from an Lighthouse audit, the following is used:

Df_pagespeed_results.loc [x, 'Time_to_Interactive'] =\ response_ object [url] [' lighthouseResult'] ['audits'] [' interactive'] ['displayValue']

Once again, it is important to make sure that each is in a loop, otherwise they will not be included in the iteration and only one result will be generated for a URL.

Step 9: convert DataFrame to CSV file

The final step is to create a summary file to collect all the results, so we can convert it to a format that is easy to analyze, such as an CSV file.

Summary = df_pagespeed_results df_pagespeed_results.head () # Download csv file summary.to_csv ('pagespeed_results.csv') files.download (' pagespeed_results.csv')

(note that this method is used to convert and download CSV files in Google Colab.)

Further explore the data

Currently, all metrics we export are stored as strings, which is the Python data type for text and characters.

Because some of the metrics we extract are actually numeric values, you may want to convert strings to numeric data types, such as integers and floating-point numbers.

Integers, also known as int, are the data types of integers, such as 1 and 10.

Floating-point numbers, also known as floating-point numbers, are decimal points, such as 1.0 and 10.1.

In order to convert a string to a number, we need to perform two steps. The first step is to replace the's' character (used to represent seconds) with spaces.

We do this by using the .str.replace method on each column.

# Replace the's' with a blank space so we can turn into numbers df_pagespeed_results ['Largest_Contentful_Paint'] = df_pagespeed_results.Largest_Contentful_Paint.str.replace (' First_Contentful_Paint',') df_pagespeed_results ['First_Contentful_Paint'] = df_pagespeed_results.First_Contentful_Paint.str.replace (' Time_to_Interactive',') df_pagespeed_results ['Time_to_Interactive'] = df_pagespeed_results.Time_to_Interactive.str.replace (' s') '') df_pagespeed_results ['Total_Blocking_Time'] = df_pagespeed_results.Total_Blocking_Time.str.replace (' ms','') df_pagespeed_results ['Speed_Index'] = df_pagespeed_results.Speed_Index.str.replace (' ms','')

Then we will use the .astype () method to convert the string to an integer or floating-point number:

# Turn strings into intergers or floats df_pagespeed_results ['Largest_Contentful_Paint'] = df_pagespeed_results.Largest_Contentful_Paint.astype (float) df_pagespeed_results [' Cumulative_Layout_Shift'] = df_pagespeed_results.Cumulative_Layout_Shift.astype (int) df_pagespeed_results ['First_Contentful_Paint'] = df_pagespeed_results.First_Contentful_Paint.astype (float) df_pagespeed_results [' Time_to_Interactive'] = df_pagespeed_results .time _ to_Interactive.astype (float) df_pagespeed_results ['Speed_Index'] = df_pagespeed_results.Speed_Index.astype (float)

After you have done this, you can use a number of different methods to further evaluate the data.

For example, you can use data visualization libraries such as matplotlib or seaborn to visualize metrics and measure how metrics change over time and group results into slow, medium, and fast buckets.

Thank you for your reading, the above is the content of "how Python monitors and measures websites". After the study of this article, I believe you have a deeper understanding of how Python monitors and measures websites, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 283

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report