Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use SQL and Python to query the flow of people respectively

2025-01-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Database >

Share

Shulou(Shulou.com)05/31 Report--

This article mainly explains "how to use SQL and Python to achieve human flow query", the content of the article is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "how to use SQL and Python respectively to achieve human flow query" bar!

Case introduction

The case comes from LeetCode, which is a common requirement in time series data.

The daily passenger flow information of a city gymnasium is recorded in three columns of information in the stadium table: serial number (id), date (visit_date) and passenger flow (people). Find out at least three consecutive rows of records of not less than 100 people flow.

Train of thought analysis

The simplest idea is certainly to do three Cartesian product joins on the stadium table, but this approach is not desirable when there is a large amount of data, and it is not generalized (for example, the requirement is changed to at least ten consecutive rows). Ali's programming specification, which forbids connections of more than three meters, is also circulating on the Internet.

In short, this kind of thinking is not what we should take, we need to find other ideas.

(1) Construction of arithmetic sequence

From the figure above, we can find a rule that the row numbers of the data regions that meet the conditions are equidifference series in both the original table and the result table, and the difference between the two arithmetic sequences is fixed. For example, the difference between series A1 and B1 is 1; the difference between series A2 and B2 is 2.

As long as we ensure that the difference of the difference series of each region is not equal, then we can filter the regions that meet the criteria by screening the number of times the difference occurs. For example, if the difference 2 appears four times and meets the condition, then the record corresponding to the difference is the data we need.

There are other ways to build the difference in addition to line numbering, and you can think about it.

(2) data slicing

As you can see from the figure, if_true is a secondary column, indicating whether the condition is met, and 1 is True,0 and False. If we want to select the regions that meet the conditions, we can slice the column with 0 to get decimal sequences of different lengths that are all 1, and filter the regions that meet the conditions according to the length of each decimal sequence.

In the diagram, we get the series of lengths an and b, and find out the areas that meet the conditions by calculating the length of the series.

Program realization

In the last section, we chose two ways of thinking, of which both Python ideas can be realized, and SQL can achieve the first one. This section implements the first idea with SQL and the second with Python.

(1) SQL

Select id,visit_date,people from (select t2.counting (1) over (partition by rn2) rn3 from (selectt1.*,rn1-row_number () over (order by visit_date) rn2 from (select *, row_number () over () as rn1 from stadium order by visit_date) T1 # T1 table generates row number where people > = 100) T2 # T2 table filters data with no less than 100 people. And subtract the newly generated row number from the original row number to get the difference where 1: 1) T3 # T3 table to calculate the number of times of each type of difference where rn3 > 2 # the data that is more than 2 is the required data

Because the ID in the actual table is hardly contiguous, it becomes a line number to ensure generalization, so you don't have to rely on ID.

In addition, it can also be achieved through user variables and other ways, you can try to think about it.

(2) Python

Import pandas as pd dt=pd.DataFrame ({"id": range (1jue 9), "visit_date": pd.date_range (start= "2017-01-01", periods=8) "people": [10 counter' 109 counter' 150 99 99 1455 199188]}) dt ["col1"] = dt ["people"] .apply (lambda x: 1 if x > = 100 else 0) # whether the new column dt ['counter'] = (dt ["col1"] = 0) is generated or not. Cumsum () # calculates the cumulative sum according to whether the col1 column is 0 or not Mark each contiguous region dt = dt [dt ["col1"]! = 0] # eliminate records with fewer than 100s gb=dt.groupby ("counter") ["id"] .count () # count the number of times of each marked value result=dt [dt ["counter"] .isin (gb[ gb > 2] .index) # filter the data that meet the criteria

One thing to note here is that if you directly slice the col1 column into a string of 0, you can figure out the number and length of areas that meet the conditions, but it's hard to find a specific area.

Split_col1= "" .join ([str (I) for i in dt ["col1"]]) .split ("0")

It was originally based on this idea, but it is more troublesome to find that the length matches the index of the string in the original list, especially when you need to find multiple index values.

However, this idea is very important because it is very simple to calculate only the maximum value of a continuous region.

Thank you for your reading, the above is "how to use SQL and Python respectively to achieve human flow query" content, after the study of this article, I believe you on how to use SQL and Python respectively to achieve human flow query this problem has a deeper understanding, the specific use of the situation also needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Database

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report