Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python to make a travel strategy

2025-03-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

How to use Python to do a travel strategy, many novices are not very clear about this, in order to help you solve this problem, the following editor will explain for you in detail, people with this need can come to learn, I hope you can gain something.

Tourism is an effective way to adjust the mood, more and more office workers and students look forward to using the holiday time to go out to play to broaden their horizons and relieve stress. However, when there is a real holiday, many people will be confused by the question of "where to play". June is a good time to travel, so let's learn how to use python to arrange our travel plans.

one。 Data acquisition

In recent years, self-service travel, which is both money-saving and leisure, has gradually become a favorite for young people to travel. Here I recommend a travel website that I like better. This time, we plan our own itinerary by analyzing the travel plans of the "hikers". The first step, of course, is to crawl the website data.

1. Analyze the target web page

In order to get everyone's travel information, we go to the "companionship" section of the website and check the travel plan of "within a month". We can see that url has not changed with the change of the page, and it is preliminarily judged that the page is loaded through js. If you want to crawl, you must first find the real url and the returned data format.

After a try, we successfully found the real url and key parameters returned by the request. Here we return the data in json format, which contains a html text.

two。 Determine the crawling content

Before we officially start crawling data, we need to determine what data needs to be crawled. The list page of the itinerary clearly shows the destination, itinerary profile, sponsor ID and gender (hehe). Although this information is very valuable, it will undoubtedly be more helpful for our itinerary planning if we can get more information, so go to the details page and take a look.

As you can see, there are detailed instructions on the departure time, itinerary duration, departure location and so on in the details page, as well as the situation of registered partners. This data can reflect the travel intention of friends to a large extent, so be sure to take it down.

3. Formal crawling data

The general idea is to climb the initiator and details page url of each trip in the index page, and then enter the details page to grab the departure time, history, destination, departure city, number of people you want and the number of people who sign up. The index page data and details page data of each trip are merged and stored as the complete data of the trip. The following is the total entrance to the crawler:

Payload parameter: flag determines the sort of itinerary. Available values are 1, 2 and 3, which represent "about to leave", "* release" and "popular companionship" respectively.

Offset represents the current number of pages, starting from 0 by default; middid represents the destination of the trip, the uncertain destination value of 0 represents the departure time, and a value of 3 indicates the itinerary within one month.

Get_info () method: grab the itinerary information of each page and turn the page. If you can't get the valid information, the crawling ends.

Data storage: due to the small amount of data, it can be stored in a dataframe data structure first, and then written to the csv file at one time.

two。 Data cleaning

Let's take a look at what the data looks like, and you can see that there is a lot of interference in each piece of data:

In the "departure time" column, all we want is date data.

In the column of "number of applicants", all we want is the numbers, and we don't need extra embellishments and so on.

The "partial trip" plan covers multiple destinations, which is very disadvantageous to our tourism destination analysis.

Therefore, we must first clean the obtained data in order to lay the foundation for formal data analysis.

1. Standard format

First of all, wash the data of duration, the number of people you want, the number of applicants (women) and the number of applicants (men), and keep only the digital part; secondly, get the ":" and the previous content according to the relevant data of departure and place of departure. Thanks to the pandas.Series.str method, we can accomplish the above work very easily. The function is as follows:

Only_num (self, col_list): removes non-digital parts of the data.

No_colon (self, col_list): removes the ":" and its preceding contents from the data.

two。 Split destination

As mentioned earlier, the inclusion of multiple destinations in a trip will interfere with our analysis, and the solution here is to split the destination data.

Split a column of data into X columns (X is the number of destinations included in the trip), again using the pandas.Series.str method, as follows:

three。 Data analysis

Now we can analyze the data to find out the travel patterns in June, and here we use pyecharts for visualization to facilitate observation.

1. Male to female ratio

First of all, the gender of the people involved in the travel plan is analyzed, and the gender distribution of itinerary publishers and participants can be easily obtained by using dataframe's sum () and groupby (). Count () method:

As can be seen from the picture, most of the sponsors are women, accounting for about 60% of the total, while the participants are just the opposite. the male ratio is about 60%. It is estimated that the female compatriots are better at planning the itinerary, while the male compatriots are mostly "lazy".

two。 Departure time

Here, we first use dataframen's groupby () method to group the data with the keyword "travel time", count the number of trips per day and the number of participants, and then draw a line chart.

Judging from the results, June 15 and 16 are the peak travel period in June (this time begins the Dragon Boat Festival holiday). Friends who have travel plans for the Dragon Boat Festival remember to prepare in advance. In addition, several peaks also appear on weekends, and it seems that most of the friends who like to travel by themselves are office workers (or college students).

3. Destination selection

First, to analyze the itinerary release data, first add all the destinations in the data to a list (including duplicate data), and then use the Counter () method in collections to calculate the frequency of each destination.

Several destinations with higher frequency are listed in the figure, and the longer the color bar, the more frequent it appears. If we pay a little attention, we will find that people prefer some areas with a relatively low degree of commercialization, such as Lhasa and Xinjiang!

In fact, as a freelance enthusiast, I also prefer more primitive and pure scenery, which will indeed have a better effect on settling the mood. If you have a travel plan and do not know where to play, you might as well choose from the above destination.

4. What is the status of the participants?

First of all, the number of men and women registered in each destination is stored in two dict, and the 10 destinations with the largest number of participants are selected respectively.

Then we will be surprised to find that although the specific rankings in the two sequences are not exactly the same, the top 10 destinations that men and women want to go to are exactly the same.

However, the number of men and women varies greatly from place to place. If you want to have an encounter, maybe the following picture will help.

Is it helpful for you to read the above content? If you want to know more about the relevant knowledge or read more related articles, please follow the industry information channel, thank you for your support.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report