In addition to Weibo, there is also WeChat
Please pay attention
WeChat public account
Shulou
2025-01-19 Update From: SLTechnology News&Howtos shulou NAV: SLTechnology News&Howtos > Development >
Share
Shulou(Shulou.com)06/02 Report--
This article will explain in detail about text () and string (.) in xpath. What are the differences, the editor thinks it is quite practical, so I share it with you as a reference. I hope you can get something after reading this article.
When we are crawling, we often encounter pages like this:
Hello, Beijing
When you encounter more situations during daily crawling, you can extract it by using xpath ("/ / div/em/text ()").
Now let's consider the following two fetching requirements:
Demand 1: when we want to extract "Beijing", should we use text () or string (.)?
Demand 2: extract "Hello, Beijing"?
Let's initialize the page using the lxml library (if you are using scrapy's xpath selector, you can also follow these steps):
From lxml import etree
With open ('foo.html', 'r') as f:
Content = f.read () .encode ('utf8')
Page = etree.HTML (content)
Here we first take a look at the solution of demand 1, that is, to extract "Beijing":
Re = page.xpath ("/ / div/text ()")
What re gets here is an array:
This is because the tags on the page
There is a newline symbol between and, so using "/ / div/text ()" will ignore you, leaving "\ n", "Beijing\ n" two elements.
Let's take the second element of re and remove the newline character "\ n" at the end (if you are using scrapy's xpath, re may not get an array):
Re = re [1] .strip ()
The re we get at this time is the "Beijing" we need.
Now take a look at the second requirement: extract "Hello, Beijing":
This requires that the text in it should also be extracted, so we use string:
Re = page.xpath ("/ / div") [0] .xpath ("string (.)")
At this point, take a look at the value of re (again if you use scrapy's selector, the result returned by scrapy_selector.xpath ("/ / div") may not be an array, but you just need to get the result and then use .xpath ("string (.)"). That's fine.) :
The result is a whole string of text "\ nHello, Beijing\ n".
It seems that using "string (.)" After that, xpath will extract the contents directly, instead of removing them and dividing them into an array like the "text ()" above. Note here that when using string (), you should use string (.) Put it in a xpath instead of writing "/ / div/string (.)" In this way, otherwise you will not be able to grab it.
Then, again, remove the extra spaces and newline characters on both sides
Re = re.strip ()
At this time, re gets "Hello, Beijing".
Summary: http://www.0510bhyy.com/ of Wuxi abortion Hospital
Through the above experiments, we find that text () in xpath will only take the text of the node in the layer and split it according to the tag of the layer to form a list. While string (.) All the text in and below the current layer node is extracted and placed in a string variable.
Example code:
Test.py:
# coding=utf-8
From lxml import etree
Import sys
Reload (sys)
Sys.setdefaultencoding ('utf-8')
Class Test (object):
Def _ init__ (self):
With open ('foo.html', 'r') as f:
Content = f.read () .encode ('utf8')
Self.page = etree.HTML (content)
Print self.page
Def xpath_text (self):
Re = self.page.xpath ("/ / div/text ()")
Print re
Re = re [1] .strip ()
Print re
Return re
Def xpath_string (self):
Re = self.page.xpath ("/ / div") [0] .xpath ("string (.)")
Print re
# replacing newline characters, etc.
Re = re.strip (re)
Print re
Return re
If _ name__ = = "_ _ main__":
T = Test ()
Assert t.xpath_text () = = u ", Beijing"
Assert t.xpath_string () = u "Hello, Beijing"
Foo.html:
Hello, Beijing
This is the end of the article on "what is the difference between text () and string (.) in xpath". I hope the above content can be helpful to you, so that you can learn more knowledge. if you think the article is good, please share it out for more people to see.
Welcome to subscribe "Shulou Technology Information " to get latest news, interesting things and hot topics in the IT industry, and controls the hottest and latest Internet news, technology news and IT industry trends.
Views: 0
*The comments in the above article only represent the author's personal views and do not represent the views and positions of this website. If you have more insights, please feel free to contribute and share.
Continue with the installation of the previous hadoop.First, install zookooper1. Decompress zookoope
"Every 5-10 years, there's a rare product, a really special, very unusual product that's the most un
© 2024 shulou.com SLNews company. All rights reserved.