In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-28 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/01 Report--
This article is about sample analysis of bs4 parsing and xpath parsing in python. The editor thinks it is very practical, so share it with you as a reference and follow the editor to have a look.
Principle of bs4 parsing:
1. Instantiate a BeautifulSoup object and load the page source data into the object
two。 Tag location and data extraction by calling relevant attributes or methods in the BeautifulSoup object
How to instantiate BeautifulSoup objects:
From bs4 import BeautifulSoup
BeautifulSoup (parameter one, parameter two)
Parameter 1 is a file descriptor, and parameter 2 is a parser, usually 'lxml'
Instantiation of an object:
1. Load data from a local html document into this object
Fp = open ('. / test.html','r',encoding='utf-8') soup=BeautifulSoup (fp,'lxml')
two。 Load the page source code obtained on the Internet into this object
Page_text = response.textsoup=BeatifulSoup (page_text,'lxml')
Soup refers to the initialized BeautifulSoup object
Methods and properties for data parsing:
1.soup.tagName: returns the tag corresponding to the tagName that first appears in the document
2.soup.find ():
(1) .find ('tagName'): equivalent to soup.tagName
(2)。 Attribute positioning: soup.find ('div',class_/id/ or other attribute =' song')
Navigate to
< div class="song">/
< div id="song">Under the label of
If class does not add _, it represents the keyword.
3.soup.find_all ('tagName'): returns all tags that meet the requirements (list)
Soup object: the Qingming Festival caught up with the drizzle, and the pedestrians on the road were even more depressed. He asked the shepherd boy where there was a restaurant. He pointed to the apricot blossom village in the distance or the bright moon and border pass of the Qin Dynasty. The officers and soldiers born in Wanli had not yet been returned. As long as Li Guang, the flying general of the Han Dynasty, was still there, we must not let the enemy troops cross the Yinshan Qiwang residence to find common. I heard your song several times in front of Cui Jiutang. Now the south of the Yangtze River is already picturesque, and in this season of falling flowers, I met you again, du Fu, du Mu, du Xiaoyue, honeymoon, Phoenix station, Phoenix, Phoenix and Phoenix, the wind went to Taiwan, and only the flowing water of the Yangtze River remained the same every day. The palace weeds of the Soochow era buried the secluded trail, and the distinguished families of the Jin Dynasty also became ancient tomb hills.
4.select:
-select ('some kind of id,class, tag... Selector)'), which returns a list.
Add nothing to the label, but to the front of the class. Id is preceded by #
Hierarchical selector:
> represents a level; multiple levels represented by spaces
Soup.select ('. Tang > ul > li > a') soup.select ('. Tang > ul a') the result is: [during the Qingming Festival, the pedestrians on the road were even more depressed. He asked the shepherd boy where there was a restaurant. He pointed to the apricot blossom village in the distance, or the bright moon and border pass of the Qin Dynasty. The officers and soldiers born in Wanli have not yet been returned. As long as Li Guang, the flying general of the Han Dynasty, is still around, we must not let the enemy troops cross the Yinshan. Qiwang's residence is common. I heard your song several times in front of Cui Jiutang. Now the south of the Yangtze River is already picturesque, and I ran into you again in this season of falling flowers. Du Fu, du Mu, and the ancient Phoenix Station once had a Phoenix collection and roam. The wind went to Taiwan, only the flowing water of the Yangtze River remained the same, and the weeds in the palace of the Soochow era buried the trail. The famous families of the Jin Dynasty have also become ancient tombs]
5. Get the text data between tags:
-soup.a.text/string/get_text ()
-text/get_text (): you can get all the text content in a tag
-string: you can only get the text content directly below the tag.
6. Get the attribute value in the label:
Soup.a ['attribute value']
Print (soup.select ('.tang > ul > li > a') [0] [' href']) result: www.baidu.comxpath parsing
One of the most commonly used and convenient and efficient parsing methods. Versatility.
Principle of xpath parsing:
1. Instantiate an etree object, and you need to load the parsed page source data into the object.
2 call the xpath method in the etree object and combine the xpath expression to realize the location of the tag and the capture of the content.
Instantiate an etree object:
From lxml import etree
-1. Load the source data from the local html document into the etree object:
Etree.parse (filePath)
-2. You can load source data obtained from the Internet into this object
Etree.HTML ('page_text')
Xpath ('xpath expression')
Xpath expression: (returns a list)
-/: indicates positioning from the root node. It represents a level.
-/: indicates multiple levels. It can mean to start positioning from any location.
-attribute location: / / div [@ class='song'] tag [@ attrname='attrvalue']
-Index location: / / div [@ class='song'] / p [3]
The index starts at 1.
Take the text:
/ text () gets the direct text content in the tag
/ / text (non-direct text content in the tag (all text content)
Fetch attributes:
/ @ attrName
Eg:/img/@src
. / indicates positioning to the current location (local parsing)
Use between multiple xpath | split:
Tree.xpath ('/ / div [@ class='song'] / p [3] | / / div [@ class='song']')
Thank you for reading! This is the end of this article on "sample analysis of bs4 parsing and xpath parsing in python". I hope the above content can be of some help to you, so that you can learn more knowledge. if you think the article is good, you can share it out for more people to see!
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.