Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How does Python climb the score line of college entrance examination over the years

2025-02-14 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/02 Report--

What this article shares with you is about how Python climbs the score line of the college entrance examination over the years. The editor thinks it is very practical, so I share it with you. I hope you can get something after reading this article. Without saying much, let's follow the editor to have a look.

Fan monologue

The college entrance examination is over. I believe that most of the students are relaxing themselves. After all, they have been depressed for so long. Although there is still some time before the release of the college entrance examination, some students may be eager to know how they did in the exam. Therefore, now to climb the college entrance examination online college entrance examination score line in recent years, take a look at the change trend of the score line in recent years, so that there is a bottom in the heart, so that we can be more relaxed.

Tool library used

Beautifulsoup

Mongodb

Echarts

1. General thinking

On the college entrance examination website, you can check the score lines of each province, and their Chinese science subjects all have data from 2009 to 2017, so you can directly crawl these data and save them in MongoDB, and then use echarts for drawing display, so that you can more directly see the changing trend of the score line of the college entrance examination.

two。 Crawl data

(1) obtain the fractional line information of each province

There are two ways to achieve this.

1)。 By stitching URL links to switch provinces, you can get the change rule of the link: as long as you replace the pinyin of the province, you can request:

Http://www.gaokao.com/guangdong/fsx/

Http://www.gaokao.com/shanghai/fsx/

It is recommended to use pypinyin module-Chinese character pinyin conversion module / tool. The pinyin of each province can be obtained by using the lazy_pinyin method directly. Since the list is returned, it needs to be processed before it can be used.

> from pypinyin import lazy_pinyin > lazy_pinyin ('Beijing') ['bei',' jing']

2)。 By getting the provincial links in the regional navigation, you can directly get the URL:

Get links to provinces:

# get province and link pro_link = [] def get_provice (url): web_data = requests.get (url, headers=header) soup = BeautifulSoup (web_data.content) 'lxml') provice_link = soup.select (' .area _ box > a') for link in provice_link: href = link ['href'] provice = link.select (' span') [0] .text data = {'href': href 'provice': provice} provice_href.insert_one (data) # stored in the database pro_link.append (href)

(2) climb the fractional line

Then you can start to climb the fractional line and filter the content directly using beautifulsoup by reviewing the elements (as shown below).

# get fractional line def get_score (url): web_data = requests.get (url, headers=header) soup = BeautifulSoup (web_data.content) 'lxml') # get the province information provice = soup.select (' .col-nav span') [0] .text [0VUR Rh5] # get the liberal arts and science categories = soup.select ('h4.ft14') category_list = [] for item in categories: category_list.append (item.text.strip (). Replace ('') )) # replace the space # get the score tables = soup.select ('h4 ~ table') for index, table in enumerate (tables): tr = table.find_all ('tr') Attrs= {'class': re.compile (' ^ c _\ slots')}) # use regular matching for j in tr: td = j.select ('td') score_list = [] for k in td: # to get the annual score if' class' not in k.attrs: Score = k.text.strip () score_list.append (score) # get the fractional line category elif 'class' in k.attrs: score_line = k.text.strip () score_data = {' provice': provice.strip () # provinces' category': category_ list [index], # Arts and Science categories' score_line': score_line,# score line category 'score_list': score_list# score list} score_detail.insert_one (score_data) # insert database

3. Start crawling

Since there are more than 30 provinces, multithreading is used here to improve crawling efficiency.

If _ _ name__ = ='_ _ main__': header = {'User-Agent':' Mozilla/5.0 (Windows NT 10.0; Win64; x64 Rv:58.0) Gecko/20100101 Firefox/58.0', 'Connection':' keep-alive'} url = 'http://www.gaokao.com/guangdong/fsx/' get_provice (url) pool = Pool () pool.map (get_score, [i for i in pro_link]) # uses multithreading

If you use multi-thread crawling, you can crawl all the data in less than 1 minute. Look, multithreading is amazing, fork will be the first waist.

4. Data visualization

Crawling data is just a * * step, and then it's time to process and display the data. Find out the data from mongodb and clean up the data. Because there is something wrong with my pyecharts here, I use echarts to display it.

1)。 Filter information such as provinces

Limit what to find directly through the find function of mongodb.

Import pymongo import charts client = pymongo.MongoClient ('localhost', 27017) gaokao = client [' gaokao'] score_detail = gaokao ['score_detail'] # filter score line, province, arts and science def get_score (line,pro,cate): score_list= [] for i in score_detail.find ({"$and": [{"score_line": line}, {"provice": pro} {'category': cate}]}): score_list = I [' score_list'] score_list.remove ('-') # remove the column score_list = list (map (int, score_list)) score_list.reverse () return score_list without data

2)。 Define related data

# get liberal arts and science scores line = 'one' pro = 'Beijing' cate_wen = 'liberal arts' cate_li = 'science' wen= [] wen=get_score [] wen=get_score (line,pro,cate_wen) # liberal arts li=get_score (line,pro,cate_li) # Science # definition year year = [2017MIT 2016MIT 2009] year.reverse ()

3)。 Line chart display

Series = [{'name':' liberal arts', 'data': wen,' type': 'line'}, {' name': 'science', 'data': li,' type': 'line',' color':'#ff0066'}] options = {'chart': {' zoomType':'xy'} 'title': {' text':'{} provincial {} fractional line '.format (pro,line)},' subtitle': {'text':' Source: gaokao.com'}, 'xAxis': {' categories': year}, 'yAxis': {' title': {'text':' score'} charts.plot (series, options=options,show='inline')

In this way, you can get the following trend chart of the calendar year fractional line. Of course, you can change the parameters of get_score to get the information of other provinces.

5. Prediction score line

Through the line chart, we can roughly predict the score line of the 2018 college entrance examination in Beijing: between 550-560 for liberal arts and 530-540 for science. Of course, this is only predicted, and if there are special circumstances, it may fluctuate greatly. In addition, the fractional line of this year can be obtained by Lagrangian interpolation, which is more accurate, but because the process is more troublesome, it is only a visual observation.

The above is how Python climbs the score line of the college entrance examination over the years. The editor believes that there are some knowledge points that we may see or use in our daily work. I hope you can learn more from this article. For more details, please follow the industry information channel.

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report