Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

How to use Python to make a daily news hotspot

2025-01-30 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Internet Technology >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains "how to use Python to make a daily news hot spot", the content of the article is simple and clear, easy to learn and understand, the following please follow the editor's ideas slowly in depth, together to study and learn "how to use Python to make a daily news hot spot"!

Basic development environment

Python 3.6

Pycharm

Analysis of import parselimport requestsimport re target web pages

Climb the international news column in the news network today.

You can see the relevant data interface, which contains the url address of news headlines and news details.

How to extract url address

1. Convert to json, and take the value of the key-value pair.

2. Use regular expressions to match url addresses

Both methods can be realized, depending on personal preference.

Turn the page according to the pager change in the interface data link, which corresponds to the page number.

On the details page, you can see that the news content is in the p tag inside the div tag, and you can get the news content according to the normal analysis of the website.

Preservation mode

1. You can save the txt text form

2. It can also be saved in PDF form.

Summary of overall crawling ideas

On the column list page, click more news content to get the interface data url

Match the news details page url in the data content returned in the API data url

Extract news content using regular parsing website operations (re, css, xpath)

Save data

Code implementation

Get the source code of the web page

Def get_html (html_url): "" get web source code response: param html_url: webpage url address: return: webpage source code "response = requests.get (url=html_url, headers=headers) return response

Get the url address of each news article

Def get_page_url (html_data): "" get the url address of each news article: param html_data: response.text: return: the url address of each news article "" page_url_list = re.findall ('"url": "(. *?)", html_data) return page_url_list

File saving names cannot contain special characters, and news headlines need to be dealt with.

Def file_name (name): "" File naming cannot carry special characters: param name: news headlines: return: headlines without special characters "" replace = re.compile (r'[\\ /\:\ *\?\ "\ |]') new_name = re.sub (replace,'_', name) return new_name

Save data

Def download (content, title): "" with open saves news content txt: param content: news content: param title: news headlines: return: "path = 'News\' + title + '.txt' with open (path, mode='a', encoding='utf-8') as f: f.write (content) print ('saving', title)

Principal function

Def main (url): "" main function: param url: news list page url address: return: "" html_data = get_html (url). Text # get the interface data response.text lis = get_page_url (html_data) # get the news url address list for li in lis: page_data = get_html (li) .content.decode ('utf-8' 'ignore') # News details page response.text selector = parsel.Selector (page_data) title = re.findall (' (. *?)', page_data Re.S) [0] # get the headline new_title = file_name (title) new_data = selector.css ('# cont_1_1_2 div.left_zw pjag title text'). Getall () content = '.join (new_data) download (content, new_title) if _ _ name__ = =' _ main__': for page in range (1) Url_1 = 'https://channel.chinanews.com/cns/cjs/gj.shtml?pager={}&pagenum=9&t=5_58'.format(page) main (url_1) running effect diagram

Thank you for your reading, the above is the content of "how to use Python to make a daily news hot spot". After the study of this article, I believe you have a deeper understanding of how to use Python to make a daily news hot spot, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Internet Technology

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report