In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-02-24 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/03 Report--
This article mainly explains "what is the method of formatting web page text by python". The content in the article is simple and clear, and it is easy to learn and understand. Please follow the editor's train of thought to study and learn "what is the method of python formatting web page text".
1. A web page usually contains text information. For different text types, we can choose the appropriate HTML semantic elements for tagging.
2. The em element is used to mark and emphasize parts of the content, and the small element is used for comments and signed text.
The theory of examples comes first in pragmatics.
Author: Confucius 1 (September 28, 551-April 11, 479)
This quotation
"Learning and" is the title of the first article in the Analects of Confucius. The first two or three words of the first chapter are generally used as the title of each article in the Analects of Confucius. "Xueer" consists of 16 chapters, covering many aspects. The key points are "three self-examination of my day", "economical use and love, so that the people can keep up with the time", "the use of propriety, harmony is precious" and moral categories such as benevolence, filial piety and faith.
Original text
The Master said, "Don't you also say that you learn and practice it from time to time? Isn't it a pleasure to have friends coming from afar? Isn't it a gentleman if you don't know it, but don't you think it's a gentleman? "
Knowledge point expansion:
Transformation between Python int and string
String- > int
1, decimal string converted to int
Int ('12')
2. Convert hexadecimal string to int
Int ('12, 16)
Int- > string
1. Convert int to decimal string
Str (18)
2. Convert int to hexadecimal string
Hex (18)
2. Since when Lianjia selected the second page online, he only added a "D2" at the back of the page, such as http://sh.lianjia.com/ershoufang/pudong/d2, so if you want to crawl more pages, you only need to update the requests page URL in a loop.
3. After adding a loop, you can print all the crawl results
From lxml import etreeimport requestsimport stringurl = 'http://sh.lianjia.com/ershoufang/'region =' pudong'price = 'p23'finalURL = url+region+pricedef spider_room (finallyURL): r = requests.get (finallyURL) html = requests.get (finalURL) .content.decode (' utf-8') dom_tree = etree.HTML (html) # all the messages all_message = dom_tree.xpath ("/ / ul [@ class='js_fang_list'] / li") for index in Range (len (all_message)): print (all_ message [index] .XPath ('string (.)'). Strip () returnfor i in range (20): finallyURL = finalURL +'/ d'+str (I) spider_room (finallyURL)
4. 20 pages of content have been crawled, but the form of the output of the content has not changed.
Thank you for your reading, the above is the content of "what is the method of formatting web page text by python". After the study of this article, I believe you have a deeper understanding of what the method of formatting web page text by python is, and the specific use needs to be verified in practice. Here is, the editor will push for you more related knowledge points of the article, welcome to follow!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.