In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-23 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article mainly explains the "Python crawler voice broadcast weather forecast case analysis", the content of the article is simple and clear, easy to learn and understand, now please follow the editor's ideas slowly in depth, together to study and learn "Python reptile voice broadcast weather forecast case analysis"!
I. preliminary preparation
The libraries to be used in this case are: requests, lxml, and pyttsx3. You can go to the command prompt interface through the cmd command and install them with the following command:
Pip install requests
Pip install lxml
Pip install pyttsx3
Requests is more convenient than urllib and can save us a lot of work. In a word, requests is the easiest HTTP library implemented by Python. It is recommended that crawlers use the requests library.
Lxml is a parsing library of Python, which supports the parsing of HTML and XML, supports XPath parsing, and is very efficient.
Pyttsx3 is a Python package that converts text to voice. Unlike other Python packages, pyttsx3 can actually convert text to voice. The basic usage is as follows:
Import pyttsx3
Test = pyttsx3.init ()
Test.say ('hello w3C subscription')
# the key sentence, if not, the voice will not be played
Test.runAndWait ()
If you are a linux system, pyttsx3 text-to-speech does not work. Then you may also need to install espeak, ffmpeg and libespeak1. The installation commands are as follows:
Sudo apt update & & sudo apt install espeak ffmpeg libespeak1
Crawler is to crawl the relevant content of the web page, understanding HTML can help you better understand the structure of the web page, content and so on.
TCP / IP protocol, HTTP protocol these knowledge had better understand, understand the basic meaning, so that you can understand the basic principles of network requests and network transmission.
Second, detailed steps
1. Get request target URL
We first import the requests library, and then use it to get the target web page. what we request is Xiamen weather in the weather website.
Import requests
# send a request to the destination url address and return a response object
Resp = requests.get ('https://www.tianqi.com/xaimen/')
# .text is the web page html of the response object
Print (resp.text)
Of course, with these three lines of code alone, it is very likely that you can not crawl the web page and display 403. What does this mean?
A 403 error is a common type of network error that indicates that the resource is not available and the server knows the customer's request but refuses to process it.
This is because if the crawler we write does not add a request header for access, the script will send a request for Python crawling by itself, and most websites will be equipped with an anti-crawler mechanism that does not allow website content to be crawled by the crawler.
So, is this inexplicable? That must be impossible, as the saying goes, there are policies at the top and countermeasures at the bottom, and we want the target server to respond accordingly, so we can camouflage our crawlers. In our small case, we can just add the commonly used User-Agent field for camouflage.
So, change our previous code to disguise the crawler as a browser request, as follows:
Import requests
Headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'}
# send a request to the destination url address and return a response object
Resp = requests.get ('https://www.tianqi.com/xaimen/',headers=headers)
# .text is the web page html of the response object
Print (resp.text)
A little friend is about to ask: where did the User-Agent field come from? Here we take the Chrome browser as an example, first casually open a web page, press F12 of the keyboard or click the right mouse button in the space to select "check"; then refresh the page, click "Network" and then click "Doc", click Headers, view the User-Agent field of Request Headers in the information bar, directly copy, paste to the compiler can be used, pay attention to add it in the form of a dictionary.
2. Lxml.etree parses web pages
The data we crawled from the web page is cluttered, and only part of it is the data we really want to get. For example, in this case, we only need the weather details of Xiamen on the web page, as shown in the figure:
So how do we extract it? Lxml.etree is going to be used at this time.
Looking at the structure of the web page, we can find that all the weather information we need is under the custom list of "dl class='weather_info'", so we just need to add the following code to the previous code to parse the information:
Html = etree.HTML (html)
Html_data = html.xpath ("/ / D1 [@ class='weather_info'] / / text ()")
It is not difficult to find that the information we get is not very consistent with what we want, the spaces and newline characters in the web page are also extracted, and the generated object is also a list type.
So, we need to do the following next:
Txt = "Welcome to Weather broadcast Assistant"
For data in html_data:
Txt + = data
Printing again is not difficult to find that we already have all the information we need, and it looks very nice, but the only drawback is that [switching cities] is still there, and we don't want it either.
So what do we do? we can replace it with a string method.
Txt = txt.replace ('[switch cities]','')
Pyttsx3 broadcasts weather information
At this point, all the data we want has been crawled down, processed and saved in the txt variable. Let him read it now. It's time for the pyttsx3 library to play. The code is as follows:
Test = pyttsx3.init ()
Test.say (txt)
Test.runAndWait ()
At this point, our small case is done, recommend a good lesson: Python static crawler, Python Scrapy web crawler.
Step by step to explore, to the realization of the function, in which the fun and sense of achievement, I believe that friends are very happy.
Finally, the complete source code is presented:
Import requests
Import pyttsx3
From lxml import etree
Url = 'https://www.tianqi.com/xiamen/'
Headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Safari/537.36'}
Resp = requests.get (url=url, headers=headers) # sends a request to the destination url address and returns a response object
Html = resp.text # .text is the html of the web page of the response object
Html = etree.HTML (html)
Html_data = html.xpath ("/ / dl [@ class='weather_info'] / / text ()")
Txt = "Welcome to Weather broadcast Assistant"
For data in html_data:
Txt + = data
Print (txt)
Txt = txt.replace ('[switch cities]','')
Txt + ='\ nThe broadcast is over! Thank you!'
Print (txt)
Test = pyttsx3.init ()
Test.say (txt)
Test.runAndWait ()
Thank you for your reading. The above is the content of "Python Crawler Voice broadcast Weather Forecast case Analysis". After the study of this article, I believe you have a deeper understanding of the weather forecast case analysis of Python Crawler Voice broadcast, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.