Network Security Internet Technology Development Database Servers Mobile Phone Android Software Apple Software Computer Software News IT Information

In addition to Weibo, there is also WeChat

Please pay attention

WeChat public account

Shulou

Case Analysis of Python Crawler Voice broadcast Weather Forecast

2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >

Share

Shulou(Shulou.com)06/01 Report--

This article mainly explains the "Python crawler voice broadcast weather forecast case analysis", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's ideas slowly in depth, together to study and learn "Python reptile voice broadcast weather forecast case analysis"!

I. preliminary preparation

The libraries to be used in this case are: requests, lxml, and pyttsx3. You can go to the command prompt interface through the cmd command and install them with the following command:

Pip install requests

Pip install lxml

Pip install pyttsx3

Requests is more convenient than urllib and can save us a lot of work. In a word, requests is the easiest HTTP library implemented by Python. It is recommended that crawlers use the requests library.

Lxml is a parsing library of Python, which supports the parsing of HTML and XML, supports XPath parsing, and is very efficient.

Pyttsx3 is a Python package that converts text to voice. Unlike other Python packages, pyttsx3 can actually convert text to voice. The basic usage is as follows:

Import pyttsx3

Test = pyttsx3.init ()

Test.say ('hello w3C subscription')

# the key sentence, if not, the voice will not be played

Test.runAndWait ()

If you are a linux system, pyttsx3 text-to-speech does not work. Then you may also need to install espeak, ffmpeg and libespeak1. The installation commands are as follows:

Sudo apt update & & sudo apt install espeak ffmpeg libespeak1

Crawler is to crawl the relevant content of the web page, understanding HTML can help you better understand the structure of the web page, content and so on.

TCP / IP protocol, HTTP protocol these knowledge had better understand, understand the basic meaning, so that you can understand the basic principles of network requests and network transmission.

Second, detailed steps

1. Get request target URL

We first import the requests library, and then use it to get the target web page. what we request is Xiamen weather in the weather website.

Import requests

# send a request to the destination url address and return a response object

Resp = requests.get ('https://www.tianqi.com/xaimen/')

# .text is the web page html of the response object

Print (resp.text)

Of course, with these three lines of code alone, it is very likely that you can not crawl the web page and display 403. What does this mean?

A 403 error is a common type of network error that indicates that the resource is not available and the server knows the customer's request but refuses to process it.

This is because if the crawler we write does not add a request header for access, the script will send a request for Python crawling by itself, and most websites will be equipped with an anti-crawler mechanism that does not allow website content to be crawled by the crawler.

So, is this inexplicable? That must be impossible, as the saying goes, there are policies at the top and countermeasures at the bottom, and we want the target server to respond accordingly, so we can camouflage our crawlers. In our small case, we can just add the commonly used User-Agent field for camouflage.

So, change our previous code to disguise the crawler as a browser request, as follows:

Import requests

Headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'}

# send a request to the destination url address and return a response object

Resp = requests.get ('https://www.tianqi.com/xaimen/',headers=headers)

# .text is the web page html of the response object

Print (resp.text)

A little friend is about to ask: where did the User-Agent field come from? Here we take the Chrome browser as an example, first casually open a web page, press F12 of the keyboard or click the right mouse button in the space to select "check"; then refresh the page, click "Network" and then click "Doc", click Headers, view the User-Agent field of Request Headers in the information bar, directly copy, paste to the compiler can be used, pay attention to add it in the form of a dictionary.

2. Lxml.etree parses web pages

The data we crawled from the web page is cluttered, and only part of it is the data we really want to get. For example, in this case, we only need the weather details of Xiamen on the web page, as shown in the figure:

So how do we extract it? Lxml.etree is going to be used at this time.

Looking at the structure of the web page, we can find that all the weather information we need is under the custom list of "dl class='weather_info'", so we just need to add the following code to the previous code to parse the information:

Html = etree.HTML (html)

Html_data = html.xpath ("/ / D1 [@ class='weather_info'] / / text ()")

It is not difficult to find that the information we get is not very consistent with what we want, the spaces and newline characters in the web page are also extracted, and the generated object is also a list type.

So, we need to do the following next:

Txt = "Welcome to Weather broadcast Assistant"

For data in html_data:

Txt + = data

Printing again is not difficult to find that we already have all the information we need, and it looks very nice, but the only drawback is that [switching cities] is still there, and we don't want it either.

So what do we do? we can replace it with a string method.

Txt = txt.replace ('[switch cities]','')

Pyttsx3 broadcasts weather information

At this point, all the data we want has been crawled down, processed and saved in the txt variable. Let him read it now. It's time for the pyttsx3 library to play. The code is as follows:

Test = pyttsx3.init ()

Test.say (txt)

Test.runAndWait ()

At this point, our small case is done, recommend a good lesson: Python static crawler, Python Scrapy web crawler.

Step by step to explore, to the realization of the function, in which the fun and sense of achievement, I believe that friends are very happy.

Finally, the complete source code is presented:

Import requests

Import pyttsx3

From lxml import etree

Url = 'https://www.tianqi.com/xiamen/'

Headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'}

Resp = requests.get (url=url, headers=headers) # sends a request to the destination url address and returns a response object

Html = resp.text # .text is the html of the web page of the response object

Html = etree.HTML (html)

Html_data = html.xpath ("/ / dl [@ class='weather_info'] / / text ()")

Txt = "Welcome to Weather broadcast Assistant"

For data in html_data:

Txt + = data

Print (txt)

Txt = txt.replace ('[switch cities]','')

Txt + ='\ nThe broadcast is over! Thank you!'

Print (txt)

Test = pyttsx3.init ()

Test.say (txt)

Test.runAndWait ()

Thank you for your reading. The above is the content of "Python Crawler Voice broadcast Weather Forecast case Analysis". After the study of this article, I believe you have a deeper understanding of the weather forecast case analysis of Python Crawler Voice broadcast, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!

Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.

Views: 0

*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.

Share To

Development

Wechat

© 2024 shulou.com SLNews company. All rights reserved.

12
Report