In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-04-01 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly introduces how to use python to analyze your online behavior, has a certain reference value, interested friends can refer to, I hope you can learn a lot after reading this article, the following let the editor take you to understand.
Brief introduction
This is a Chrome browsing history analyzer that lets you know your browsing history. Of course, it is only suitable for Chrome browsers or browsers with Chrome as the core.
On this page you will be able to view the top 10 rankings of domain names, URL and busy days you have visited in the past, as well as related data charts.
Partial screenshot
Code thinking
1. Directory structure
First, let's take a look at the overall directory structure.
Code ├─ app_callback.py callback function Implement background functions ├─ app_configuration.py web server configuration ├─ app_layout.py web front-end page configure ├─ app_plot.py web chart drawing ├─ app.py web server some static resource files │ ├─ css web required to start ├─ assets web Front-end element layout file │ │ ├─ custum-styles_phyloapp.css │ │ └─ stylesheet.css │ ├─ image web front-end logo icon │ │ ├─ GitHub-Mark-Light.png │ └─ static web front-end help page │ │ ├─ help.html │ │ └─ help.md history_data.py parsing chrome history text Dependent libraries required by └─ requirement.txt programs
App_callback.py
The program is based on python and is deployed using the dash web lightweight framework. App_callback.py is mainly used for callbacks, which can be understood as the implementation of background functions.
App_configuration.py
As the name implies, some configuration operations for the web server.
App_layout..py
Web front-end page configuration, including html, css elements.
App_plot.py
This is mainly for the implementation of some web front-end chart data.
App.py
Startup of the web server.
Assets
Static resource directory, which is used to store some static resource data that we need.
History_data.py
By connecting to the sqlite database and parsing the Chrome history file.
Requirement.txt
The dependent library needed to run this program.
two。 Parsing history file data
Files related to parsing history file data are history_data.py files. We analyze them one by one.
# query database content def query_sqlite_db (history_db, query): # query sqlite database # Note: History is a file with no suffix name. It is not a directory. Conn = sqlite3.connect (history_db) cursor = conn.cursor () # using sqlite viewing software, you can clearly see the fields of table visits, the fields of urls, id # connect tables urls and visits, and get the specified data select_statement = query # execute database query statement cursor.execute (select_statement) # to get data The data format is tuple (tuple) results = cursor.fetchall () # close cursor.close () conn.close () return results
The code flow of this function is:
Connect the sqlite database, execute the query statement, return the query structure, and finally close the database connection.
# get sorted historical data def get_history_data (history_file_path): try: # get database content # data format is tuple (tuple) select_statement = "SELECT urls.id, urls.url, urls.title, urls.last_visit_time, urls.visit_count, visits.visit_time, visits.from_visit, visits.transition, visits.visit_duration FROM urls, visits WHERE urls.id = visits.url "result = query_sqlite_db (history_file_path, select_statement) # sort the results by the first element # sort and sorted built-in functions prioritize the first element, then the second element And so on result_sort = sorted (result, key=lambda x: (x [0], x [1], x [2], x [3], x [4], x [5], x [6], x [7], x [8])) # returns sorted data return result_sort except: # print ('read error!') Return 'error'
The code flow of this function is:
Set the database query statement select_statement and call the query_sqlite_db () function to get the parsed history file data. And the returned history data files are sorted according to different element rules. At this point, the sorted parsed history data file is obtained successfully.
3. Basic configuration of web server
The files related to the basic configuration of the web server are app_configuration.py and app.py files. Including setting the port number of the web server, access rights, static resource directory, and so on.
4. Front-end page deployment
The files related to the front-end deployment are the app_layout.py and app_plot.py and assets directories.
The front-end layout mainly includes the following elements:
Upload History File component
Draw page visits component
Draw the component of ranking the total stay time of page visit
Daily page visits scatter chart component
The scatter chart component of the number of visits at different times of a day
Top 10 URL components visited most
Search keyword ranking component
Search engine usage component
In app_layout.py, most of these components are configured the same as the usual html and css configurations, so we only take the configuration page visits ranking component as an example.
# Page visits rank html.Div (style= {'margin-bottom':'150px'}, children= [html.Div (style= {' border-top-style':'solid','border-bottom-style':'solid'}), className='row', children= [html.Span (page visits ranking,', style= {'font-weight':' bold') 'color':'red'}), html.Span (children=' shows the number:',), dcc.Input (id='input_website_count_rank', type='text', value=10, style= {'margin-top':'10px',' margin-bottom':'10px'}),]) Html.Div (style= {'position':' relative', 'margin':' 0 auto', 'width':' 100%, 'padding-bottom':' 50%',}, children= [children= [dcc.Graph (id='graph_website_count_rank', style= {'position':' absolute') 'width':' 100%, 'height':' 100%, 'top':' 0, 'left':' 0, 'bottom':' 0, 'right':' 0'}, config= {'displayModeBar': False},), type='dot', style= {' position': 'absolute' 'top':' 50%, 'left':' 50%, 'transform':' translate (- 50%)),],)])
As you can see, although written by python, anyone with front-end experience can easily add or remove some elements on this basis, so we won't go into details about how to use html and css.
In app_plot.py, it is mainly related to drawing charts. Using the plotly library, a library of drawing components for web interaction.
Here to draw a page access frequency ranking bar chart as an example, how to use the plotly library to draw.
# draw a bar chart of page access frequency ranking def plot_bar_website_count_rank (value History_data): # Frequency dictionary dict_data = {} # traverse the history file for data in history_data: url = data [1] # simplified url key = url_simplification (url) if (key in dict_data.keys ()): dict_ data [key] + = 1 else: dict_ data [key] = 0 # filter out the number with the highest frequency in the first k According to k = convert_to_number (value) top_10_dict = get_top_k_from_dict (dict_data) K) figure = go.Figure (data= [go.Bar (x = [i for i in top_10_dict.keys ()], y = [i for i in top_10_dict.values ()], name='bar', marker=go.bar.Marker (color='rgb (55,83,109)')), layout=go.Layout (showlegend=False, margin=go.layout.Margin (lumped 40) Paper_bgcolor='rgba, plot_bgcolor='rgba, xaxis=dict (title=' website'), yaxis=dict (number of title='') return figure
The code flow of this function is:
First of all, the history_data returned after parsing the database file is traversed to get url data, and url_simplification (url) alignment is called to simplify. Then, the simplified url is stored in the dictionary in turn.
Call get_top_k_from_dict (dict_data, k) to get the data for the first k maximum values from the dictionary dict_data.
Then, we began to draw a bar chart. Use go.Bar () to draw a bar chart, where x and y represent attributes and values corresponding to attributes, in list format. Xaxis and yaxis` set the title of the corresponding axis respectively
Returns a figure object for transmission to the front end.
The data contained in the assets directory are image and css, which are used for front-end layouts.
5. Background deployment
The files related to the background deployment are app_callback.py files. This file uses callbacks to update the front-end page layout.
First, let's take a look at the callback function about the ranking of page visits:
# Page visit frequency ranking @ app.callback (dash.dependencies.Output ('graph_website_count_rank',' figure'), [dash.dependencies.Input ('input_website_count_rank',' value'), dash.dependencies.Input ('store_memory_history_data',' data')]) def update (value Store_memory_history_data): # get the history file if store_memory_history_data: history_data = store_memory_history_data ['history_data'] figure = plot_bar_website_count_rank (value, history_data) return figure else: # cancel updating page data raise dash.exceptions.PreventUpdate ("cancel the callback")
The code flow of this function is:
First of all, determine what the input is (the data that triggers the callback), what the output is (the data of the callback output), and what data you need to bring with you. Dash.dependencies.Input refers to the data that triggers the callback, and dash.dependencies.Input ('input_website_count_rank',' value') indicates that the callback will be triggered when the value of the component whose id is input_website_count_rank changes. The result of the callback after update (value, store_memory_history_data) will be output to the value whose id is graph_website_count_rank. Generally speaking, it is to change its value.
Parsing for def update (value, store_memory_history_data). The first step is to determine whether the input data store_memory_history_data is not an empty object, then read the history file history_data, then call plot_bar_website_count_rank () in the app_plot.py file just mentioned, return a figure object, and return the object to the front end. At this point, the layout of the front-end page will show a chart of page visit frequency ranking.
Another thing to say is about the process of the last file, here we post the code first:
# callback @ app.callback (dash.dependencies.Output ('store_memory_history_data',' data'), [dash.dependencies.Input ('dcc_upload_file',' contents')]) def update (contents): if contents is not None: # receive base64-encoded data content_type, content_string = contents.split (' ') # base64 decode the files uploaded by the client decoded = base64.b64decode (content_string) # add a suffix to the files uploaded by the client Prevent duplicate file overwriting # ensure that the file name is not duplicated in the following ways: suffix = [str (random.randint (0100)) for i in range (10)] suffix = ".join (suffix) suffix = suffix + str (int (time.time () # final file name file_name = 'History_' + suffix # print (file_name) # create a directory where files are stored If (not (exists ('data')): makedirs (' data') # the path to the file to be written path = 'data' +' /'+ file_name # writes to the local disk file with open (file=path Mode='wb+') as f: f.write (decoded) # use sqlite to read local disk files # get history data history_data = get_history_data (path) # get search keyword data search_word = get_search_word (path) # determine whether the read data is correct if (history_data! = 'error'): # find date_time = time.strftime ('% Y-%m-%d% HRV% MRV% S') Time.localtime (time.time ()) print ('new data received from the client, data correct, time: {}' .format (date_time)) store_data = {'history_data': history_data,' search_word': search_word} return store_data else: # date_time = time.strftime ('% Y-%m-%d% HGV% MGV% S') Time.localtime (time.time ()) print ('new data received from client, data error, time: {}' .format (date_time)) return None return None
The code flow of this function is:
First, determine whether the data contents uploaded by the user is not empty, and then decode the files uploaded by the client with base64. Moreover, add a suffix to the file uploaded by the client to prevent the file from being overwritten repeatedly, and finally write the file uploaded by the client to the local disk file.
After writing, use sqlite to read the local disk file. If it is read correctly, the parsed data will be returned, otherwise None will be returned.
How to run
Online demo program: http://39.106.118.77:8090( ordinary server, do not test pressure)
It is very easy to run this program, you only need to follow the following command to run:
# Jump to the current directory cd directory name # uninstall the dependent library pip uninstall-y-r requirement.txt# and then reinstall the dependent library pip install-r requirement.txt# start running python app.py# successfully, open http://localhost:8090 through the browser and thank you for reading this article carefully. I hope the article "how to use python to analyze your online behavior" shared by the editor will be helpful to you. At the same time, I also hope that you will support and pay attention to the industry information channel, and more related knowledge is waiting for you to learn!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.